Natural language processing (NLP) for clinical information extraction and healthcare research

FONFERKO-SHADRACH, BEATA

doi:10.23889/SUthesis.65061

E-Thesis 996 views 576 downloads

Natural language processing (NLP) for clinical information extraction and healthcare research / BEATA FONFERKO-SHADRACH

Swansea University Author: BEATA FONFERKO-SHADRACH

PDF | E-Thesis – open access

Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).
Download (7.28MB)

DOI (Published version): 10.23889/SUthesis.65061

Abstract

Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that...

Full description

Published:	Swansea, Wales, UK 2023
Institution:	Swansea University
Degree level:	Doctoral
Degree name:	Ph.D
Supervisor:	Halcox, Julian. and Pickrell, William Owen.
URI:	https://cronfa.swan.ac.uk/Record/cronfa65061

first_indexed	2023-11-21T15:12:28Z
last_indexed	2024-11-25T14:15:17Z
id	cronfa65061
recordtype	RisThesis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2023-11-21T15:24:54.3701230</datestamp><bib-version>v2</bib-version><id>65061</id><entry>2023-11-21</entry><title>Natural language processing (NLP) for clinical information extraction and healthcare research</title><swanseaauthors><author><sid>2d17cc2bf75b0aa1d7ebead5778f8d88</sid><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><name>BEATA FONFERKO-SHADRACH</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-21</date><abstract>Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data</keywords><publishedDay>28</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-09-28</publishedDate><doi>10.23889/SUthesis.65061</doi><url/><notes>A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Halcox, Julian. and Pickrell, William Owen.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>Staff funding</degreesponsorsfunders><apcterm/><funders>Swansea University Staff Funding</funders><projectreference/><lastEdited>2023-11-21T15:24:54.3701230</lastEdited><Created>2023-11-21T15:07:00.1202590</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><order>1</order></author></authors><documents><document><filename>65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf</filename><originalFilename>2023_Fonferko-Shadrach_B.final.65061.pdf</originalFilename><uploaded>2023-11-21T15:20:53.1884437</uploaded><type>Output</type><contentLength>7629185</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by-nc/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2023-11-21T15:24:54.3701230 v2 65061 2023-11-21 Natural language processing (NLP) for clinical information extraction and healthcare research 2d17cc2bf75b0aa1d7ebead5778f8d88 BEATA FONFERKO-SHADRACH BEATA FONFERKO-SHADRACH true false 2023-11-21 Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research. E-Thesis Swansea, Wales, UK Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data 28 9 2023 2023-09-28 10.23889/SUthesis.65061 A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information. COLLEGE NANME COLLEGE CODE Swansea University Halcox, Julian. and Pickrell, William Owen. Doctoral Ph.D Staff funding Swansea University Staff Funding 2023-11-21T15:24:54.3701230 2023-11-21T15:07:00.1202590 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science BEATA FONFERKO-SHADRACH 1 65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf 2023_Fonferko-Shadrach_B.final.65061.pdf 2023-11-21T15:20:53.1884437 Output 7629185 application/pdf E-Thesis – open access true Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0). true eng https://creativecommons.org/licenses/by-nc/4.0/
title	Natural language processing (NLP) for clinical information extraction and healthcare research
spellingShingle	Natural language processing (NLP) for clinical information extraction and healthcare research BEATA FONFERKO-SHADRACH
title_short	Natural language processing (NLP) for clinical information extraction and healthcare research
title_full	Natural language processing (NLP) for clinical information extraction and healthcare research
title_fullStr	Natural language processing (NLP) for clinical information extraction and healthcare research
title_full_unstemmed	Natural language processing (NLP) for clinical information extraction and healthcare research
title_sort	Natural language processing (NLP) for clinical information extraction and healthcare research
author_id_str_mv	2d17cc2bf75b0aa1d7ebead5778f8d88
author_id_fullname_str_mv	2d17cc2bf75b0aa1d7ebead5778f8d88_***_BEATA FONFERKO-SHADRACH
author	BEATA FONFERKO-SHADRACH
author2	BEATA FONFERKO-SHADRACH
format	E-Thesis
publishDate	2023
institution	Swansea University
doi_str_mv	10.23889/SUthesis.65061
college_str	Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id	facultyofmedicinehealthandlifesciences
hierarchy_top_title	Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id	facultyofmedicinehealthandlifesciences
hierarchy_parent_title	Faculty of Medicine, Health and Life Sciences
department_str	Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science
document_store_str	1
active_str	0
description	Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.
published_date	2023-09-28T05:15:43Z
_version_	1858707226487685120
score	11.453587

Natural language processing (NLP) for clinical information extraction and healthcare research / BEATA FONFERKO-SHADRACH

Similar Items