No Cover Image

Journal article 739 views 75 downloads

Development and validation of an automated basal cell carcinoma histopathology information extraction system using natural language processing

Stephen Ali, Huw Strafford, Thomas Dobbs, Beata Fonferko-Shadrach, Arron Lacey, Owen Pickrell Orcid Logo, Hayley Hutchings Orcid Logo, Iain Whitaker

Frontiers in Surgery, Volume: 9

Swansea University Authors: Stephen Ali, Huw Strafford, Thomas Dobbs, Beata Fonferko-Shadrach, Arron Lacey, Owen Pickrell Orcid Logo, Hayley Hutchings Orcid Logo, Iain Whitaker

  • 60444.VOR.pdf

    PDF | Version of Record

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY).

    Download (925.51KB)

Abstract

IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and p...

Full description

Published in: Frontiers in Surgery
ISSN: 2296-875X
Published: Frontiers Media SA 2022
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa60444
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: IntroductionRoutinely collected healthcare data are a powerful research resource, but often lack detailed disease-specific information that is collected in clinical free text such as histopathology reports. We aim to use Natural Language Processing (NLP) techniques to extract detailed clinical and pathological information from histopathology reports to enrich routinely collected data.MethodsWe used the general architecture for text engineering (GATE) framework to build an NLP information extraction system using rule-based techniques. During validation we deployed our rule-based NLP pipeline on 200 previously unseen, de-identified and pseudonymised BCC histopathological reports from Swansea Bay University Health Board, Wales, UK. Results of our algorithm were compared to gold standard human abstraction by two independent and blinded expert clinicians involved in skin cancer care.ResultsWe identified 11,224 items of information with a mean precision, recall and F1 score of 86.0% (95% CI 75.1-96.9), 84.2% (95% CI 72.8-96.1) and 84.5% (95% CI 73.0-95.1) respectively. The difference between clinician annotator F1 scores was 7.9% in comparison to 15.5% between the NLP pipeline and the gold standard corpus. Cohen's Kappa score on annotated tokens was 0.85.ConclusionUsing an NLP rule-based approach for NER in BCC we have been able to develop and validate a pipeline with a potential application in improving cancer registry data, service planning and enhancing the quality of routinely collected data for research.
College: Faculty of Medicine, Health and Life Sciences
Funders: SRA and TDD are funded by the Welsh Clinical Academic Training Fellowship. ISW is the surgical Specialty Lead for Health and Care Research Wales and reports active grants from the American Association of Plastic Surgeons and the European Association of Plastic Surgeons, is an editor for Frontiers of Surgery, an associate editor for the Annals of Plastic Surgery, and is in the editorial board of BMC Medicine and numerous other editorial board roles. SRA received a grant from the British Association of Plastic, Reconstructive and Aesthetic Surgeons (BAPRAS) specifically for this work. The Reconstructive Surgery & Regenerative Medicine Research Centre is funded by The Scar Free Foundation.