No Cover Image

Journal article 536 views 207 downloads

Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records

Gavin Tsang, Shang-ming Zhou Orcid Logo, Xianghua Xie Orcid Logo

IEEE Journal of Translational Engineering in Health and Medicine, Volume: 9, Pages: 1 - 13

Swansea University Authors: Gavin Tsang, Shang-ming Zhou Orcid Logo, Xianghua Xie Orcid Logo

  • 55654.VOR.pdf

    PDF | Version of Record

    This work is licensed under a Creative Commons Attribution 4.0 License.

    Download (1.83MB)

Abstract

A growing elderly population suffering from incur- able, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for pre- ven...

Full description

Published in: IEEE Journal of Translational Engineering in Health and Medicine
ISSN: 2168-2372
Published: Institute of Electrical and Electronics Engineers (IEEE) 2021
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa55654
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2020-12-30T12:47:43Z
last_indexed 2021-01-29T04:20:16Z
id cronfa55654
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><datestamp>2021-01-28T16:09:45.7338522</datestamp><bib-version>v2</bib-version><id>55654</id><entry>2020-11-12</entry><title>Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records</title><swanseaauthors><author><sid>ca887fecde0f72eaf96f0785f018113f</sid><firstname>Gavin</firstname><surname>Tsang</surname><name>Gavin Tsang</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>118578a62021ba8ef61398da0a8750da</sid><ORCID>0000-0002-0719-9353</ORCID><firstname>Shang-ming</firstname><surname>Zhou</surname><name>Shang-ming Zhou</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b334d40963c7a2f435f06d2c26c74e11</sid><ORCID>0000-0002-2701-8660</ORCID><firstname>Xianghua</firstname><surname>Xie</surname><name>Xianghua Xie</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2020-11-12</date><abstract>A growing elderly population suffering from incur- able, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for pre- ventative measures to alleviate said strain. Electronic health records provide opportunity for big data analysis to address such applications. Such data however, provides a challenging problem space for traditional statistics and machine learning due to high dimensionality and sparse data elements.This paper proposes a novel machine learning methodology: entropy regularization with ensemble deep neural networks (ECNN), which simultaneously provides high predictive per- formance of hospitalization of patients with dementia whilst enabling an interpretable heuristic analysis of the model archi- tecture, able to identify individual features of importance within a large feature domain space.Experimental results on health records containing 54,647 features were able to identify 10 event indicators within a patient timeline: a collection of diagnostic events, medication prescriptions and procedural events, the highest ranked being essential hypertension. The resulting subset was still able to pro- vide a highly competitive hospitalization prediction (Accuracy: 0.759) as compared to the full feature domain (Accuracy: 0.755) or traditional feature selection techniques (Accuracy: 0.737), a significant reduction in feature size.The discovery and heuristic evidence of correlation provide evidence for further clinical study of said medical events as potential novel indicators. There also remains great potential for adaption of ECNN within other medical big data domains as a data mining tool for novel risk factor identification.</abstract><type>Journal Article</type><journal>IEEE Journal of Translational Engineering in Health and Medicine</journal><volume>9</volume><journalNumber/><paginationStart>1</paginationStart><paginationEnd>13</paginationEnd><publisher>Institute of Electrical and Electronics Engineers (IEEE)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2168-2372</issnElectronic><keywords>Deep learning; dementia; electronic health records; feature selection; hospitalization; machine learning; risk factors; weight regularization.</keywords><publishedDay>1</publishedDay><publishedMonth>1</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-01-01</publishedDate><doi>10.1109/jtehm.2020.3040236</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm/><lastEdited>2021-01-28T16:09:45.7338522</lastEdited><Created>2020-11-12T21:24:34.3472972</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Gavin</firstname><surname>Tsang</surname><order>1</order></author><author><firstname>Shang-ming</firstname><surname>Zhou</surname><orcid>0000-0002-0719-9353</orcid><order>2</order></author><author><firstname>Xianghua</firstname><surname>Xie</surname><orcid>0000-0002-2701-8660</orcid><order>3</order></author></authors><documents><document><filename>55654__18937__1f52fe634ae74f2f962fd269c164a030.pdf</filename><originalFilename>55654.VOR.pdf</originalFilename><uploaded>2020-12-30T12:46:37.3824609</uploaded><type>Output</type><contentLength>1914398</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>This work is licensed under a Creative Commons Attribution 4.0 License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs><OutputDur><Id>32</Id><DataControllerName>SAIL data bank</DataControllerName><IsDataAvailableOnline>true</IsDataAvailableOnline><DataNotAvailableOnlineReasonId xsi:nil="true"/><DurUrl>https://saildatabank.com</DurUrl><IsDurRestrictions>true</IsDurRestrictions><DurRestrictionReasonId xsi:nil="true"/><DurEmbargoDate xsi:nil="true"/></OutputDur></OutputDurs></rfc1807>
spelling 2021-01-28T16:09:45.7338522 v2 55654 2020-11-12 Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records ca887fecde0f72eaf96f0785f018113f Gavin Tsang Gavin Tsang true false 118578a62021ba8ef61398da0a8750da 0000-0002-0719-9353 Shang-ming Zhou Shang-ming Zhou true false b334d40963c7a2f435f06d2c26c74e11 0000-0002-2701-8660 Xianghua Xie Xianghua Xie true false 2020-11-12 A growing elderly population suffering from incur- able, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for pre- ventative measures to alleviate said strain. Electronic health records provide opportunity for big data analysis to address such applications. Such data however, provides a challenging problem space for traditional statistics and machine learning due to high dimensionality and sparse data elements.This paper proposes a novel machine learning methodology: entropy regularization with ensemble deep neural networks (ECNN), which simultaneously provides high predictive per- formance of hospitalization of patients with dementia whilst enabling an interpretable heuristic analysis of the model archi- tecture, able to identify individual features of importance within a large feature domain space.Experimental results on health records containing 54,647 features were able to identify 10 event indicators within a patient timeline: a collection of diagnostic events, medication prescriptions and procedural events, the highest ranked being essential hypertension. The resulting subset was still able to pro- vide a highly competitive hospitalization prediction (Accuracy: 0.759) as compared to the full feature domain (Accuracy: 0.755) or traditional feature selection techniques (Accuracy: 0.737), a significant reduction in feature size.The discovery and heuristic evidence of correlation provide evidence for further clinical study of said medical events as potential novel indicators. There also remains great potential for adaption of ECNN within other medical big data domains as a data mining tool for novel risk factor identification. Journal Article IEEE Journal of Translational Engineering in Health and Medicine 9 1 13 Institute of Electrical and Electronics Engineers (IEEE) 2168-2372 Deep learning; dementia; electronic health records; feature selection; hospitalization; machine learning; risk factors; weight regularization. 1 1 2021 2021-01-01 10.1109/jtehm.2020.3040236 COLLEGE NANME COLLEGE CODE Swansea University 2021-01-28T16:09:45.7338522 2020-11-12T21:24:34.3472972 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Gavin Tsang 1 Shang-ming Zhou 0000-0002-0719-9353 2 Xianghua Xie 0000-0002-2701-8660 3 55654__18937__1f52fe634ae74f2f962fd269c164a030.pdf 55654.VOR.pdf 2020-12-30T12:46:37.3824609 Output 1914398 application/pdf Version of Record true This work is licensed under a Creative Commons Attribution 4.0 License. true eng https://creativecommons.org/licenses/by/4.0/ 32 SAIL data bank true https://saildatabank.com true
title Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
spellingShingle Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
Gavin Tsang
Shang-ming Zhou
Xianghua Xie
title_short Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
title_full Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
title_fullStr Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
title_full_unstemmed Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
title_sort Modeling Large Sparse Data for Feature Selection: Hospital Admission Predictions of the Dementia Patients Using Primary Care Electronic Health Records
author_id_str_mv ca887fecde0f72eaf96f0785f018113f
118578a62021ba8ef61398da0a8750da
b334d40963c7a2f435f06d2c26c74e11
author_id_fullname_str_mv ca887fecde0f72eaf96f0785f018113f_***_Gavin Tsang
118578a62021ba8ef61398da0a8750da_***_Shang-ming Zhou
b334d40963c7a2f435f06d2c26c74e11_***_Xianghua Xie
author Gavin Tsang
Shang-ming Zhou
Xianghua Xie
author2 Gavin Tsang
Shang-ming Zhou
Xianghua Xie
format Journal article
container_title IEEE Journal of Translational Engineering in Health and Medicine
container_volume 9
container_start_page 1
publishDate 2021
institution Swansea University
issn 2168-2372
doi_str_mv 10.1109/jtehm.2020.3040236
publisher Institute of Electrical and Electronics Engineers (IEEE)
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description A growing elderly population suffering from incur- able, chronic conditions such as dementia present a continual strain on medical services due to mental impairment paired with high comorbidity resulting in increased hospitalization risk. The identification of at risk individuals allows for pre- ventative measures to alleviate said strain. Electronic health records provide opportunity for big data analysis to address such applications. Such data however, provides a challenging problem space for traditional statistics and machine learning due to high dimensionality and sparse data elements.This paper proposes a novel machine learning methodology: entropy regularization with ensemble deep neural networks (ECNN), which simultaneously provides high predictive per- formance of hospitalization of patients with dementia whilst enabling an interpretable heuristic analysis of the model archi- tecture, able to identify individual features of importance within a large feature domain space.Experimental results on health records containing 54,647 features were able to identify 10 event indicators within a patient timeline: a collection of diagnostic events, medication prescriptions and procedural events, the highest ranked being essential hypertension. The resulting subset was still able to pro- vide a highly competitive hospitalization prediction (Accuracy: 0.759) as compared to the full feature domain (Accuracy: 0.755) or traditional feature selection techniques (Accuracy: 0.737), a significant reduction in feature size.The discovery and heuristic evidence of correlation provide evidence for further clinical study of said medical events as potential novel indicators. There also remains great potential for adaption of ECNN within other medical big data domains as a data mining tool for novel risk factor identification.
published_date 2021-01-01T04:10:02Z
_version_ 1763753698060664832
score 11.036815