E-Thesis 465 views 98 downloads
Natural language processing (NLP) for clinical information extraction and healthcare research / BEATA FONFERKO-SHADRACH
Swansea University Author: BEATA FONFERKO-SHADRACH
-
PDF | E-Thesis – open access
Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).
Download (7.28MB)
DOI (Published version): 10.23889/SUthesis.65061
Abstract
Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that...
Published: |
Swansea, Wales, UK
2023
|
---|---|
Institution: | Swansea University |
Degree level: | Doctoral |
Degree name: | Ph.D |
Supervisor: | Halcox, Julian. and Pickrell, William Owen. |
URI: | https://cronfa.swan.ac.uk/Record/cronfa65061 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2023-11-21T15:12:28Z |
---|---|
last_indexed |
2023-11-21T15:12:28Z |
id |
cronfa65061 |
recordtype |
RisThesis |
fullrecord |
<?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>65061</id><entry>2023-11-21</entry><title>Natural language processing (NLP) for clinical information extraction and healthcare research</title><swanseaauthors><author><sid>2d17cc2bf75b0aa1d7ebead5778f8d88</sid><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><name>BEATA FONFERKO-SHADRACH</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-21</date><abstract>Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data</keywords><publishedDay>28</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-09-28</publishedDate><doi>10.23889/SUthesis.65061</doi><url/><notes>A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Halcox, Julian. and Pickrell, William Owen.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>Staff funding</degreesponsorsfunders><apcterm/><funders>Swansea University Staff Funding</funders><projectreference/><lastEdited>2023-11-21T15:24:54.3701230</lastEdited><Created>2023-11-21T15:07:00.1202590</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>BEATA</firstname><surname>FONFERKO-SHADRACH</surname><order>1</order></author></authors><documents><document><filename>65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf</filename><originalFilename>2023_Fonferko-Shadrach_B.final.65061.pdf</originalFilename><uploaded>2023-11-21T15:20:53.1884437</uploaded><type>Output</type><contentLength>7629185</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by-nc/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
spelling |
v2 65061 2023-11-21 Natural language processing (NLP) for clinical information extraction and healthcare research 2d17cc2bf75b0aa1d7ebead5778f8d88 BEATA FONFERKO-SHADRACH BEATA FONFERKO-SHADRACH true false 2023-11-21 Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research. E-Thesis Swansea, Wales, UK Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data 28 9 2023 2023-09-28 10.23889/SUthesis.65061 A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information. COLLEGE NANME COLLEGE CODE Swansea University Halcox, Julian. and Pickrell, William Owen. Doctoral Ph.D Staff funding Swansea University Staff Funding 2023-11-21T15:24:54.3701230 2023-11-21T15:07:00.1202590 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science BEATA FONFERKO-SHADRACH 1 65061__29074__d5c6a7abc1604c8a8778d9d4ae853541.pdf 2023_Fonferko-Shadrach_B.final.65061.pdf 2023-11-21T15:20:53.1884437 Output 7629185 application/pdf E-Thesis – open access true Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0). true eng https://creativecommons.org/licenses/by-nc/4.0/ |
title |
Natural language processing (NLP) for clinical information extraction and healthcare research |
spellingShingle |
Natural language processing (NLP) for clinical information extraction and healthcare research BEATA FONFERKO-SHADRACH |
title_short |
Natural language processing (NLP) for clinical information extraction and healthcare research |
title_full |
Natural language processing (NLP) for clinical information extraction and healthcare research |
title_fullStr |
Natural language processing (NLP) for clinical information extraction and healthcare research |
title_full_unstemmed |
Natural language processing (NLP) for clinical information extraction and healthcare research |
title_sort |
Natural language processing (NLP) for clinical information extraction and healthcare research |
author_id_str_mv |
2d17cc2bf75b0aa1d7ebead5778f8d88 |
author_id_fullname_str_mv |
2d17cc2bf75b0aa1d7ebead5778f8d88_***_BEATA FONFERKO-SHADRACH |
author |
BEATA FONFERKO-SHADRACH |
author2 |
BEATA FONFERKO-SHADRACH |
format |
E-Thesis |
publishDate |
2023 |
institution |
Swansea University |
doi_str_mv |
10.23889/SUthesis.65061 |
college_str |
Faculty of Medicine, Health and Life Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
department_str |
Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science |
document_store_str |
1 |
active_str |
0 |
description |
Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research. |
published_date |
2023-09-28T15:25:01Z |
_version_ |
1783187690176380928 |
score |
11.037603 |