E-Thesis 222 views 389 downloads
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples / DREW MAGOWAN
Swansea University Author: DREW MAGOWAN
DOI (Published version): 10.23889/SUthesis.70267
Abstract
This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current li...
| Published: |
Swansea, Wales, UK
2025
|
|---|---|
| Institution: | Swansea University |
| Degree level: | Doctoral |
| Degree name: | M.D |
| Supervisor: | Owen, Rhiannon ; Dunstan, Peter |
| URI: | https://cronfa.swan.ac.uk/Record/cronfa70267 |
| first_indexed |
2025-09-04T15:21:25Z |
|---|---|
| last_indexed |
2025-09-05T06:13:19Z |
| id |
cronfa70267 |
| recordtype |
RisThesis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-09-04T16:35:00.1567808</datestamp><bib-version>v2</bib-version><id>70267</id><entry>2025-09-04</entry><title>Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples</title><swanseaauthors><author><sid>3749445ee5db1aee93a35fabf5f792bf</sid><firstname>DREW</firstname><surname>MAGOWAN</surname><name>DREW MAGOWAN</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-09-04</date><abstract>This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea, Wales, UK</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Colorectal cancer, Raman spectroscopy, colorectal adenoma, colorectal polyp, liquid biopsy, machine learning, screening</keywords><publishedDay>26</publishedDay><publishedMonth>6</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-06-26</publishedDate><doi>10.23889/SUthesis.70267</doi><url/><notes>ORCiD identifier: https://orcid.org/0000-0002-5086-2720</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Owen, Rhiannon ; Dunstan, Peter</supervisor><degreelevel>Doctoral</degreelevel><degreename>M.D</degreename><apcterm/><funders/><projectreference/><lastEdited>2025-09-04T16:35:00.1567808</lastEdited><Created>2025-09-04T16:18:27.5766275</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Biomedical Science</level></path><authors><author><firstname>DREW</firstname><surname>MAGOWAN</surname><order>1</order></author></authors><documents><document><filename>70267__35044__bf2abc6e81034bf8a11d6ac12f9cb56b.pdf</filename><originalFilename>Magowan_Drew_S_MD_Thesis_Final_Cronfa.pdf</originalFilename><uploaded>2025-09-04T16:28:20.6143530</uploaded><type>Output</type><contentLength>13587839</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The Author, Drew S. Magowan, 2025.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-09-04T16:35:00.1567808 v2 70267 2025-09-04 Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples 3749445ee5db1aee93a35fabf5f792bf DREW MAGOWAN DREW MAGOWAN true false 2025-09-04 This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway. E-Thesis Swansea, Wales, UK Colorectal cancer, Raman spectroscopy, colorectal adenoma, colorectal polyp, liquid biopsy, machine learning, screening 26 6 2025 2025-06-26 10.23889/SUthesis.70267 ORCiD identifier: https://orcid.org/0000-0002-5086-2720 COLLEGE NANME COLLEGE CODE Swansea University Owen, Rhiannon ; Dunstan, Peter Doctoral M.D 2025-09-04T16:35:00.1567808 2025-09-04T16:18:27.5766275 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Biomedical Science DREW MAGOWAN 1 70267__35044__bf2abc6e81034bf8a11d6ac12f9cb56b.pdf Magowan_Drew_S_MD_Thesis_Final_Cronfa.pdf 2025-09-04T16:28:20.6143530 Output 13587839 application/pdf E-Thesis – open access true Copyright: The Author, Drew S. Magowan, 2025. true eng |
| title |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| spellingShingle |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples DREW MAGOWAN |
| title_short |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| title_full |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| title_fullStr |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| title_full_unstemmed |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| title_sort |
Raman Spectroscopy with Machine Learning in the Assessment of a FIT-Positive Bowel Screening Population: Assessing the Feasibility of Detecting Colorectal Cancer and Adenomas Using Human Serum Samples |
| author_id_str_mv |
3749445ee5db1aee93a35fabf5f792bf |
| author_id_fullname_str_mv |
3749445ee5db1aee93a35fabf5f792bf_***_DREW MAGOWAN |
| author |
DREW MAGOWAN |
| author2 |
DREW MAGOWAN |
| format |
E-Thesis |
| publishDate |
2025 |
| institution |
Swansea University |
| doi_str_mv |
10.23889/SUthesis.70267 |
| college_str |
Faculty of Medicine, Health and Life Sciences |
| hierarchytype |
|
| hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
| hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
| hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
| department_str |
Swansea University Medical School - Biomedical Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Biomedical Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
This thesis describes Raman spectroscopy combined with machine learning models for the non-invasive diagnosis of colorectal cancer and colorectal adenomas in a bowel screening population who have tested positive using a standard faecal immunochemical test. The aims were to review relevant current literature in blood-based biomarkers for colorectal cancer and colorectal adenomas, and to describe study methods and results including population characteristics, Raman spectral comparative analysis and machine learning model diagnostic classification outcomes. A literature review identified a growing field of diagnostic tests with acceptable sensitivity and specificity, comparable or superior to faecal-based testing. However, studies demonstrated a broad range of heterogenous tests, techniques and reporting quality which made objective comparisons and selecting the best candidates difficult. For this reason, a narrative literature review was preferred to a systematic review and meta-analysis. Supervised and unsupervised analysis was undertaken for pre-processed Raman spectral data from 400 serum samples using principal component analysis, random forest ranked features of importance and Mann-Whitney U testing of mean spectra. These analyses were chosen to reduce data dimensionality, highlight spectral patterns and to test asymmetrical data for statistically significant differences between spectra. Spectral variance was low, however, multiple wavenumber regions of interest were identified and cross-referenced with known Raman peak assignments to identify potential underlying biomolecules involved in group differentiation. Biomolecule classes of interest included fatty acids, carbohydrates, amino acids, nucleotides and other molecules including lipids. Machine learning models including random forest, extreme gradient boost, logistic regression (with and without elastic net regularisation) and support vector machine were trained using preprocessed Raman spectral data for each set of diagnostic groups. These models were chosen due to their proven classification ability in other studies involving biological samples. Diagnostic classification area under the curve (AUC) ranged from 0.348 (95%CI 0.260 to 0.436) to 0.583 (95%CI 424 to 0.694). These results likely arose from low classification power resultant from low spectral variance between groups, a high number of training variables, inadequate sample size, biologically complex samples, a lack of significantly advanced cancers and the dilutional effect of a large colorectal adenoma population. There remains potential clinical utility for Raman spectroscopy as an adjunct to (or to replace) faecal tests for colorectal cancer screening. However, current AUC results do not support its use at present. A much higher sample number will be required to allow a fuller understanding of machine model classification ability and a more informed discussion regarding its use in the screening pathway. |
| published_date |
2025-06-26T05:30:26Z |
| _version_ |
1851098007075815424 |
| score |
11.089386 |

