No Cover Image

E-Thesis 467 views 97 downloads

Natural language processing (NLP) for clinical information extraction and healthcare research / BEATA FONFERKO-SHADRACH

Swansea University Author: BEATA FONFERKO-SHADRACH

  • 2023_Fonferko-Shadrach_B.final.65061.pdf

    PDF | E-Thesis – open access

    Copyright: The Author, Beata Fonferko-Shadrach, 2023. Distributed under the terms of a Creative Commons Attribution Non Commercial 4.0 License (CC BY-NC 4.0).

    Download (7.28MB)

DOI (Published version): 10.23889/SUthesis.65061

Abstract

Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that...

Full description

Published: Swansea, Wales, UK 2023
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Halcox, Julian. and Pickrell, William Owen.
URI: https://cronfa.swan.ac.uk/Record/cronfa65061
Abstract: Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research.
Item Description: A selection of content is redacted or is partially redacted from this thesis to protect sensitive and personal information.
Keywords: Clinical natural language processing, information extraction, application validation, genetic data linkage, routinely collected health data
College: Faculty of Medicine, Health and Life Sciences
Funders: Swansea University Staff Funding