No Cover Image

E-Thesis 497 views 342 downloads

Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers / GAVIN TSANG

Swansea University Author: GAVIN TSANG

DOI (Published version): 10.23889/SUthesis.61814

Abstract

As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse d...

Full description

Published: Swansea 2022
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Xie, Xianghua
URI: https://cronfa.swan.ac.uk/Record/cronfa61814
first_indexed 2022-11-08T11:35:23Z
last_indexed 2023-01-13T19:22:49Z
id cronfa61814
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2022-11-08T11:44:39.3198379</datestamp><bib-version>v2</bib-version><id>61814</id><entry>2022-11-08</entry><title>Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers</title><swanseaauthors><author><sid>35ba5aa06ef4ebb54bfac247a47c1022</sid><firstname>GAVIN</firstname><surname>TSANG</surname><name>GAVIN TSANG</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-11-08</date><abstract>As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such &#x201C;big data&#x201D;.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Machine Learning, Feature Selection, Electronic Health Record, Deep Learning, Sepsis, Dementia</keywords><publishedDay>7</publishedDay><publishedMonth>11</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-11-07</publishedDate><doi>10.23889/SUthesis.61814</doi><url/><notes>ORCiD identifier: https://orcid.org/0000-0002-2035-1452</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Xie, Xianghua</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>EPSRC (EP/N028139/1)</degreesponsorsfunders><apcterm/><funders/><projectreference/><lastEdited>2022-11-08T11:44:39.3198379</lastEdited><Created>2022-11-08T11:31:55.3132942</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>GAVIN</firstname><surname>TSANG</surname><order>1</order></author></authors><documents><document><filename>61814__25688__4674b8f75f814726ad5879fe11c7e2e7.pdf</filename><originalFilename>Tsang_Gavin_PhD_Thesis_Final_Redacted_Signature.pdf</originalFilename><uploaded>2022-11-08T11:42:00.7407336</uploaded><type>Output</type><contentLength>1670287</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Gavin Tsang, 2022.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2022-11-08T11:44:39.3198379 v2 61814 2022-11-08 Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers 35ba5aa06ef4ebb54bfac247a47c1022 GAVIN TSANG GAVIN TSANG true false 2022-11-08 As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such “big data”.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications. E-Thesis Swansea Machine Learning, Feature Selection, Electronic Health Record, Deep Learning, Sepsis, Dementia 7 11 2022 2022-11-07 10.23889/SUthesis.61814 ORCiD identifier: https://orcid.org/0000-0002-2035-1452 COLLEGE NANME COLLEGE CODE Swansea University Xie, Xianghua Doctoral Ph.D EPSRC (EP/N028139/1) 2022-11-08T11:44:39.3198379 2022-11-08T11:31:55.3132942 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science GAVIN TSANG 1 61814__25688__4674b8f75f814726ad5879fe11c7e2e7.pdf Tsang_Gavin_PhD_Thesis_Final_Redacted_Signature.pdf 2022-11-08T11:42:00.7407336 Output 1670287 application/pdf E-Thesis – open access true Copyright: The author, Gavin Tsang, 2022. true eng
title Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
spellingShingle Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
GAVIN TSANG
title_short Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
title_full Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
title_fullStr Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
title_full_unstemmed Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
title_sort Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers
author_id_str_mv 35ba5aa06ef4ebb54bfac247a47c1022
author_id_fullname_str_mv 35ba5aa06ef4ebb54bfac247a47c1022_***_GAVIN TSANG
author GAVIN TSANG
author2 GAVIN TSANG
format E-Thesis
publishDate 2022
institution Swansea University
doi_str_mv 10.23889/SUthesis.61814
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such “big data”.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications.
published_date 2022-11-07T05:21:04Z
_version_ 1821381612419416064
score 11.04748