No Cover Image

E-Thesis 222 views 389 downloads

Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples / DREW MAGOWAN

Swansea University Author: DREW MAGOWAN

DOI (Published version): 10.23889/SUthesis.70267

Abstract

This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current li...

Full description

Published: Swansea, Wales, UK 2025
Institution: Swansea University
Degree level: Doctoral
Degree name: M.D
Supervisor: Owen, Rhiannon ; Dunstan, Peter
URI: https://cronfa.swan.ac.uk/Record/cronfa70267
first_indexed 2025-09-04T15:21:25Z
last_indexed 2025-09-05T06:13:19Z
id cronfa70267
recordtype RisThesis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-09-04T16:35:00.1567808</datestamp><bib-version>v2</bib-version><id>70267</id><entry>2025-09-04</entry><title>Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples</title><swanseaauthors><author><sid>3749445ee5db1aee93a35fabf5f792bf</sid><firstname>DREW</firstname><surname>MAGOWAN</surname><name>DREW MAGOWAN</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-09-04</date><abstract>This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Colorectal cancer, Raman spectroscopy, colorectal adenoma, colorectal polyp, liquid biopsy, machine learning, screening</keywords><publishedDay>26</publishedDay><publishedMonth>6</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-06-26</publishedDate><doi>10.23889/SUthesis.70267</doi><url/><notes>ORCiD identifier: https://orcid.org/0000-0002-5086-2720</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Owen, Rhiannon ; Dunstan, Peter</supervisor><degreelevel>Doctoral</degreelevel><degreename>M.D</degreename><apcterm/><funders/><projectreference/><lastEdited>2025-09-04T16:35:00.1567808</lastEdited><Created>2025-09-04T16:18:27.5766275</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Biomedical Science</level></path><authors><author><firstname>DREW</firstname><surname>MAGOWAN</surname><order>1</order></author></authors><documents><document><filename>70267__35044__bf2abc6e81034bf8a11d6ac12f9cb56b.pdf</filename><originalFilename>Magowan_Drew_S_MD_Thesis_Final_Cronfa.pdf</originalFilename><uploaded>2025-09-04T16:28:20.6143530</uploaded><type>Output</type><contentLength>13587839</contentLength><contentType>application/pdf</contentType><version>E-Thesis &#x2013; open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Drew S. Magowan, 2025.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2025-09-04T16:35:00.1567808 v2 70267 2025-09-04 Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples 3749445ee5db1aee93a35fabf5f792bf DREW MAGOWAN DREW MAGOWAN true false 2025-09-04 This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway. E-Thesis Swansea, Wales, UK Colorectal cancer, Raman spectroscopy, colorectal adenoma, colorectal polyp, liquid biopsy, machine learning, screening 26 6 2025 2025-06-26 10.23889/SUthesis.70267 ORCiD identifier: https://orcid.org/0000-0002-5086-2720 COLLEGE NANME COLLEGE CODE Swansea University Owen, Rhiannon ; Dunstan, Peter Doctoral M.D 2025-09-04T16:35:00.1567808 2025-09-04T16:18:27.5766275 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Biomedical Science DREW MAGOWAN 1 70267__35044__bf2abc6e81034bf8a11d6ac12f9cb56b.pdf Magowan_Drew_S_MD_Thesis_Final_Cronfa.pdf 2025-09-04T16:28:20.6143530 Output 13587839 application/pdf E-Thesis – open access true Copyright: The Author, Drew S. Magowan, 2025. true eng
title Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
spellingShingle Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
DREW MAGOWAN
title_short Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
title_full Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
title_fullStr Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
title_full_unstemmed Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
title_sort Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples
author_id_str_mv 3749445ee5db1aee93a35fabf5f792bf
author_id_fullname_str_mv 3749445ee5db1aee93a35fabf5f792bf_***_DREW MAGOWAN
author DREW MAGOWAN
author2 DREW MAGOWAN
format E-Thesis
publishDate 2025
institution Swansea University
doi_str_mv 10.23889/SUthesis.70267
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Biomedical Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Biomedical Science
document_store_str 1
active_str 0
description This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway.
published_date 2025-06-26T05:30:26Z
_version_ 1851098007075815424
score 11.089386