Journal article 662 views 56 downloads
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach
PLOS ONE, Volume: 18, Issue: 3, Start page: e0279076
Swansea University Authors: Jonathan Kennedy, Tash Kennedy Kennedy, Roxanne Cooksey , Sinead Brophy
-
PDF | Version of Record
Copyright: © 2023 Kennedy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Download (978.14KB)
DOI (Published version): 10.1371/journal.pone.0279076
Abstract
Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are like...
Published in: | PLOS ONE |
---|---|
ISSN: | 1932-6203 |
Published: |
Public Library of Science (PLoS)
2023
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa63908 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
first_indexed |
2023-07-18T15:09:33Z |
---|---|
last_indexed |
2023-07-18T15:09:33Z |
id |
cronfa63908 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>63908</id><entry>2023-07-18</entry><title>Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach</title><swanseaauthors><author><sid>08163d1f58d7fefcb1c695bcc2e0ef68</sid><ORCID/><firstname>Jonathan</firstname><surname>Kennedy</surname><name>Jonathan Kennedy</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>3f6f07de33204db4c0ab665fb4b36367</sid><firstname>Tash Kennedy</firstname><surname>Kennedy</surname><name>Tash Kennedy Kennedy</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>df63826249b712dcb03cb0161d0f3daf</sid><ORCID>0000-0002-6763-9373</ORCID><firstname>Roxanne</firstname><surname>Cooksey</surname><name>Roxanne Cooksey</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>84f5661b35a729f55047f9e793d8798b</sid><ORCID>0000-0001-7417-2858</ORCID><firstname>Sinead</firstname><surname>Brophy</surname><name>Sinead Brophy</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-07-18</date><deptcode>HDAT</deptcode><abstract>Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future. The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset. The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%). Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis.</abstract><type>Journal Article</type><journal>PLOS ONE</journal><volume>18</volume><journalNumber>3</journalNumber><paginationStart>e0279076</paginationStart><paginationEnd/><publisher>Public Library of Science (PLoS)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>1932-6203</issnElectronic><keywords>Ankylosing spondylitis, arthritis, pain, machine learning, decision trees, rheumatology, diagnostic medicine, NSAIDS</keywords><publishedDay>31</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-03-31</publishedDate><doi>10.1371/journal.pone.0279076</doi><url>http://dx.doi.org/10.1371/journal.pone.0279076</url><notes/><college>COLLEGE NANME</college><department>Health Data Science</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>HDAT</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank.</funders><projectreference/><lastEdited>2023-08-22T16:11:21.8536461</lastEdited><Created>2023-07-18T16:02:44.6016722</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Jonathan</firstname><surname>Kennedy</surname><orcid/><order>1</order></author><author><firstname>Tash Kennedy</firstname><surname>Kennedy</surname><order>2</order></author><author><firstname>Roxanne</firstname><surname>Cooksey</surname><orcid>0000-0002-6763-9373</orcid><order>3</order></author><author><firstname>Ernest</firstname><surname>Choy</surname><order>4</order></author><author><firstname>Stefan</firstname><surname>Siebert</surname><orcid>0000-0002-1802-7311</orcid><order>5</order></author><author><firstname>Muhammad</firstname><surname>Rahman</surname><order>6</order></author><author><firstname>Sinead</firstname><surname>Brophy</surname><orcid>0000-0001-7417-2858</orcid><order>7</order></author></authors><documents><document><filename>63908__28141__3dd5f031e4f44f3e8b07f3404b5b00bb.pdf</filename><originalFilename>63908.VOR.pdf</originalFilename><uploaded>2023-07-18T16:06:49.6766065</uploaded><type>Output</type><contentLength>1001618</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: © 2023 Kennedy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
spelling |
v2 63908 2023-07-18 Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach 08163d1f58d7fefcb1c695bcc2e0ef68 Jonathan Kennedy Jonathan Kennedy true false 3f6f07de33204db4c0ab665fb4b36367 Tash Kennedy Kennedy Tash Kennedy Kennedy true false df63826249b712dcb03cb0161d0f3daf 0000-0002-6763-9373 Roxanne Cooksey Roxanne Cooksey true false 84f5661b35a729f55047f9e793d8798b 0000-0001-7417-2858 Sinead Brophy Sinead Brophy true false 2023-07-18 HDAT Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future. The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset. The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%). Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis. Journal Article PLOS ONE 18 3 e0279076 Public Library of Science (PLoS) 1932-6203 Ankylosing spondylitis, arthritis, pain, machine learning, decision trees, rheumatology, diagnostic medicine, NSAIDS 31 3 2023 2023-03-31 10.1371/journal.pone.0279076 http://dx.doi.org/10.1371/journal.pone.0279076 COLLEGE NANME Health Data Science COLLEGE CODE HDAT Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank. 2023-08-22T16:11:21.8536461 2023-07-18T16:02:44.6016722 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Jonathan Kennedy 1 Tash Kennedy Kennedy 2 Roxanne Cooksey 0000-0002-6763-9373 3 Ernest Choy 4 Stefan Siebert 0000-0002-1802-7311 5 Muhammad Rahman 6 Sinead Brophy 0000-0001-7417-2858 7 63908__28141__3dd5f031e4f44f3e8b07f3404b5b00bb.pdf 63908.VOR.pdf 2023-07-18T16:06:49.6766065 Output 1001618 application/pdf Version of Record true Copyright: © 2023 Kennedy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. true eng https://creativecommons.org/licenses/by/4.0/ |
title |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
spellingShingle |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach Jonathan Kennedy Tash Kennedy Kennedy Roxanne Cooksey Sinead Brophy |
title_short |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
title_full |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
title_fullStr |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
title_full_unstemmed |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
title_sort |
Predicting a diagnosis of ankylosing spondylitis using primary care health records–A machine learning approach |
author_id_str_mv |
08163d1f58d7fefcb1c695bcc2e0ef68 3f6f07de33204db4c0ab665fb4b36367 df63826249b712dcb03cb0161d0f3daf 84f5661b35a729f55047f9e793d8798b |
author_id_fullname_str_mv |
08163d1f58d7fefcb1c695bcc2e0ef68_***_Jonathan Kennedy 3f6f07de33204db4c0ab665fb4b36367_***_Tash Kennedy Kennedy df63826249b712dcb03cb0161d0f3daf_***_Roxanne Cooksey 84f5661b35a729f55047f9e793d8798b_***_Sinead Brophy |
author |
Jonathan Kennedy Tash Kennedy Kennedy Roxanne Cooksey Sinead Brophy |
author2 |
Jonathan Kennedy Tash Kennedy Kennedy Roxanne Cooksey Ernest Choy Stefan Siebert Muhammad Rahman Sinead Brophy |
format |
Journal article |
container_title |
PLOS ONE |
container_volume |
18 |
container_issue |
3 |
container_start_page |
e0279076 |
publishDate |
2023 |
institution |
Swansea University |
issn |
1932-6203 |
doi_str_mv |
10.1371/journal.pone.0279076 |
publisher |
Public Library of Science (PLoS) |
college_str |
Faculty of Medicine, Health and Life Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
department_str |
Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science |
url |
http://dx.doi.org/10.1371/journal.pone.0279076 |
document_store_str |
1 |
active_str |
0 |
description |
Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future. The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset. The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%). Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis. |
published_date |
2023-03-31T16:11:22Z |
_version_ |
1774942507506860032 |
score |
11.037166 |