Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing

Ali, Stephen; Strafford, Huw; Dobbs, Thomas; Fonferko-Shadrach, Beata; Lacey, Arron; Pickrell, Owen; Hutchings, Hayley; Whitaker, Iain

doi:10.3389/fsurg.2022.870494

Journal article 1306 views 153 downloads

Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing

Stephen Ali, Huw Strafford, Thomas Dobbs, Beata Fonferko-Shadrach, Arron Lacey

, Owen Pickrell

, Hayley Hutchings

, Iain Whitaker

Frontiers in Surgery, Volume: 9

Swansea University Authors: Stephen Ali, Huw Strafford, Thomas Dobbs, Beata Fonferko-Shadrach, Arron Lacey , Owen Pickrell , Hayley Hutchings , Iain Whitaker

PDF | Version of Record

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY).
Download (925.51KB)

Check full text

DOI (Published version): 10.3389/fsurg.2022.870494

Abstract

IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and p...

Full description

Published in:	Frontiers in Surgery
ISSN:	2296-875X
Published:	Frontiers Media SA 2022
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa60444

first_indexed	2022-07-26T12:50:22Z
last_indexed	2023-01-13T19:20:34Z
id	cronfa60444
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2022-11-17T10:59:25.2703837</datestamp><bib-version>v2</bib-version><id>60444</id><entry>2022-07-11</entry><title>Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing</title><swanseaauthors><author><sid>8c210736c07c6aa2514e0f6b3cfd9764</sid><firstname>Stephen</firstname><surname>Ali</surname><name>Stephen Ali</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>a6389fc6d4d18e7b67033ee04b381e43</sid><firstname>Huw</firstname><surname>Strafford</surname><name>Huw Strafford</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>d18101ae0b4e72051f735ef68f45e1a8</sid><firstname>Thomas</firstname><surname>Dobbs</surname><name>Thomas Dobbs</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>7d3f1e80939f2b8fab6a16b5ec6ac845</sid><firstname>Beata</firstname><surname>Fonferko-Shadrach</surname><name>Beata Fonferko-Shadrach</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b69d245574e754d2637cc9e76379fe11</sid><ORCID>0000-0001-7983-8073</ORCID><firstname>Arron</firstname><surname>Lacey</surname><name>Arron Lacey</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>1c3044b5ff7a6552ff5e8c9e3901c807</sid><ORCID>0000-0003-4396-5657</ORCID><firstname>Owen</firstname><surname>Pickrell</surname><name>Owen Pickrell</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>bdf5d5f154d339dd92bb25884b7c3652</sid><ORCID>0000-0003-4155-1741</ORCID><firstname>Hayley</firstname><surname>Hutchings</surname><name>Hayley Hutchings</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>830074c59291938a55b480dcbee4697e</sid><ORCID/><firstname>Iain</firstname><surname>Whitaker</surname><name>Iain Whitaker</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-07-11</date><abstract>IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data.MethodsWe used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised BCC histopathological reports from Swansea Bay University Health Board, Wales, UK. Results of our algorithm were compared to gold standard human abstraction by two independent and blinded expert clinicians involved in skin cancer care.ResultsWe identified 11,224 items of information with a mean precision, recall and F1 score of 86.0% (95% CI 75.1-96.9), 84.2% (95% CI 72.8-96.1) and 84.5% (95% CI 73.0-95.1) respectively. The difference between clinician annotator F1 scores was 7.9% in comparison to 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85.ConclusionUsing an NLP rule-based approach for NER in BCC we have been able to develop and validate a pipeline with a potential application in improving cancer registry data, service planning and enhancing the quality of routinely collected data for research.</abstract><type>Journal Article</type><journal>Frontiers in Surgery</journal><volume>9</volume><journalNumber/><paginationStart/><paginationEnd/><publisher>Frontiers Media SA</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2296-875X</issnElectronic><keywords/><publishedDay>24</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-08-24</publishedDate><doi>10.3389/fsurg.2022.870494</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm>Other</apcterm><funders>SRA and TDD are funded by the Welsh Clinical Academic Training Fellowship. ISW is the surgical Specialty Lead for Health and Care Research Wales and reports active grants from the American Association of Plastic Surgeons and the European Association of Plastic Surgeons, is an editor for Frontiers of Surgery, an associate editor for the Annals of Plastic Surgery, and is in the editorial board of BMC Medicine and numerous other editorial board roles. SRA received a grant from the British Association of Plastic, Reconstructive and Aesthetic Surgeons (BAPRAS) specifically for this work. The Reconstructive Surgery & Regenerative Medicine Research Centre is funded by The Scar Free Foundation.</funders><projectreference/><lastEdited>2022-11-17T10:59:25.2703837</lastEdited><Created>2022-07-11T14:12:11.6505192</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Medicine</level></path><authors><author><firstname>Stephen</firstname><surname>Ali</surname><order>1</order></author><author><firstname>Huw</firstname><surname>Strafford</surname><order>2</order></author><author><firstname>Thomas</firstname><surname>Dobbs</surname><order>3</order></author><author><firstname>Beata</firstname><surname>Fonferko-Shadrach</surname><order>4</order></author><author><firstname>Arron</firstname><surname>Lacey</surname><orcid>0000-0001-7983-8073</orcid><order>5</order></author><author><firstname>Owen</firstname><surname>Pickrell</surname><orcid>0000-0003-4396-5657</orcid><order>6</order></author><author><firstname>Hayley</firstname><surname>Hutchings</surname><orcid>0000-0003-4155-1741</orcid><order>7</order></author><author><firstname>Iain</firstname><surname>Whitaker</surname><orcid/><order>8</order></author></authors><documents><document><filename>60444__25326__46bb9762fdb44892af1c6526f918327f.pdf</filename><originalFilename>60444.VOR.pdf</originalFilename><uploaded>2022-10-06T13:17:27.3868903</uploaded><type>Output</type><contentLength>947723</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2022-11-17T10:59:25.2703837 v2 60444 2022-07-11 Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing 8c210736c07c6aa2514e0f6b3cfd9764 Stephen Ali Stephen Ali true false a6389fc6d4d18e7b67033ee04b381e43 Huw Strafford Huw Strafford true false d18101ae0b4e72051f735ef68f45e1a8 Thomas Dobbs Thomas Dobbs true false 7d3f1e80939f2b8fab6a16b5ec6ac845 Beata Fonferko-Shadrach Beata Fonferko-Shadrach true false b69d245574e754d2637cc9e76379fe11 0000-0001-7983-8073 Arron Lacey Arron Lacey true false 1c3044b5ff7a6552ff5e8c9e3901c807 0000-0003-4396-5657 Owen Pickrell Owen Pickrell true false bdf5d5f154d339dd92bb25884b7c3652 0000-0003-4155-1741 Hayley Hutchings Hayley Hutchings true false 830074c59291938a55b480dcbee4697e Iain Whitaker Iain Whitaker true false 2022-07-11 IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data.MethodsWe used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised BCC histopathological reports from Swansea Bay University Health Board, Wales, UK. Results of our algorithm were compared to gold standard human abstraction by two independent and blinded expert clinicians involved in skin cancer care.ResultsWe identified 11,224 items of information with a mean precision, recall and F1 score of 86.0% (95% CI 75.1-96.9), 84.2% (95% CI 72.8-96.1) and 84.5% (95% CI 73.0-95.1) respectively. The difference between clinician annotator F1 scores was 7.9% in comparison to 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85.ConclusionUsing an NLP rule-based approach for NER in BCC we have been able to develop and validate a pipeline with a potential application in improving cancer registry data, service planning and enhancing the quality of routinely collected data for research. Journal Article Frontiers in Surgery 9 Frontiers Media SA 2296-875X 24 8 2022 2022-08-24 10.3389/fsurg.2022.870494 COLLEGE NANME COLLEGE CODE Swansea University Other SRA and TDD are funded by the Welsh Clinical Academic Training Fellowship. ISW is the surgical Specialty Lead for Health and Care Research Wales and reports active grants from the American Association of Plastic Surgeons and the European Association of Plastic Surgeons, is an editor for Frontiers of Surgery, an associate editor for the Annals of Plastic Surgery, and is in the editorial board of BMC Medicine and numerous other editorial board roles. SRA received a grant from the British Association of Plastic, Reconstructive and Aesthetic Surgeons (BAPRAS) specifically for this work. The Reconstructive Surgery & Regenerative Medicine Research Centre is funded by The Scar Free Foundation. 2022-11-17T10:59:25.2703837 2022-07-11T14:12:11.6505192 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Medicine Stephen Ali 1 Huw Strafford 2 Thomas Dobbs 3 Beata Fonferko-Shadrach 4 Arron Lacey 0000-0001-7983-8073 5 Owen Pickrell 0000-0003-4396-5657 6 Hayley Hutchings 0000-0003-4155-1741 7 Iain Whitaker 8 60444__25326__46bb9762fdb44892af1c6526f918327f.pdf 60444.VOR.pdf 2022-10-06T13:17:27.3868903 Output 947723 application/pdf Version of Record true This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). true eng http://creativecommons.org/licenses/by/4.0/
title	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
spellingShingle	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing Stephen Ali Huw Strafford Thomas Dobbs Beata Fonferko-Shadrach Arron Lacey Owen Pickrell Hayley Hutchings Iain Whitaker
title_short	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_full	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_fullStr	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_full_unstemmed	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
title_sort	Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing
author_id_str_mv	8c210736c07c6aa2514e0f6b3cfd9764 a6389fc6d4d18e7b67033ee04b381e43 d18101ae0b4e72051f735ef68f45e1a8 7d3f1e80939f2b8fab6a16b5ec6ac845 b69d245574e754d2637cc9e76379fe11 1c3044b5ff7a6552ff5e8c9e3901c807 bdf5d5f154d339dd92bb25884b7c3652 830074c59291938a55b480dcbee4697e
author_id_fullname_str_mv	8c210736c07c6aa2514e0f6b3cfd9764_*_Stephen Ali a6389fc6d4d18e7b67033ee04b381e43__Huw Strafford d18101ae0b4e72051f735ef68f45e1a8__Thomas Dobbs 7d3f1e80939f2b8fab6a16b5ec6ac845__Beata Fonferko-Shadrach b69d245574e754d2637cc9e76379fe11__Arron Lacey 1c3044b5ff7a6552ff5e8c9e3901c807__Owen Pickrell bdf5d5f154d339dd92bb25884b7c3652__Hayley Hutchings 830074c59291938a55b480dcbee4697e_*_Iain Whitaker
author	Stephen Ali Huw Strafford Thomas Dobbs Beata Fonferko-Shadrach Arron Lacey Owen Pickrell Hayley Hutchings Iain Whitaker
author2	Stephen Ali Huw Strafford Thomas Dobbs Beata Fonferko-Shadrach Arron Lacey Owen Pickrell Hayley Hutchings Iain Whitaker
format	Journal article
container_title	Frontiers in Surgery
container_volume	9
publishDate	2022
institution	Swansea University
issn	2296-875X
doi_str_mv	10.3389/fsurg.2022.870494
publisher	Frontiers Media SA
college_str	Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id	facultyofmedicinehealthandlifesciences
hierarchy_top_title	Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id	facultyofmedicinehealthandlifesciences
hierarchy_parent_title	Faculty of Medicine, Health and Life Sciences
department_str	Swansea University Medical School - Medicine{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Medicine
document_store_str	1
active_str	0
description	IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data.MethodsWe used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised BCC histopathological reports from Swansea Bay University Health Board, Wales, UK. Results of our algorithm were compared to gold standard human abstraction by two independent and blinded expert clinicians involved in skin cancer care.ResultsWe identified 11,224 items of information with a mean precision, recall and F1 score of 86.0% (95% CI 75.1-96.9), 84.2% (95% CI 72.8-96.1) and 84.5% (95% CI 73.0-95.1) respectively. The difference between clinician annotator F1 scores was 7.9% in comparison to 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85.ConclusionUsing an NLP rule-based approach for NER in BCC we have been able to develop and validate a pipeline with a potential application in improving cancer registry data, service planning and enhancing the quality of routinely collected data for research.
published_date	2022-08-24T05:03:43Z
_version_	1858706471358824448
score	11.453587

Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing

Similar Items