Journal article 186 views 40 downloads
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
Scientific Data, Volume: 11, Issue: 1
Swansea University Author: Ashley Akbari
-
PDF | Version of Record
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Download (2.07MB)
DOI (Published version): 10.1038/s41597-024-02958-1
Abstract
Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completen...
Published in: | Scientific Data |
---|---|
ISSN: | 2052-4463 |
Published: |
Springer Science and Business Media LLC
2024
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa65674 |
first_indexed |
2024-03-05T10:24:05Z |
---|---|
last_indexed |
2024-11-25T14:16:36Z |
id |
cronfa65674 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2024-03-26T12:01:29.8824949</datestamp><bib-version>v2</bib-version><id>65674</id><entry>2024-02-23</entry><title>Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity</title><swanseaauthors><author><sid>aa1b025ec0243f708bb5eb0a93d6fb52</sid><ORCID>0000-0003-0814-0801</ORCID><firstname>Ashley</firstname><surname>Akbari</surname><name>Ashley Akbari</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-02-23</date><deptcode>MEDS</deptcode><abstract>Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.</abstract><type>Journal Article</type><journal>Scientific Data</journal><volume>11</volume><journalNumber>1</journalNumber><paginationStart/><paginationEnd/><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2052-4463</issnElectronic><keywords/><publishedDay>22</publishedDay><publishedMonth>2</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-02-22</publishedDate><doi>10.1038/s41597-024-02958-1</doi><url/><notes>Data availability:The data used in this study are available in NHS England’s Secure Data Environment (SDE) service for England (https://digital.nhs.uk/services/secure-data-environment-service). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://bhfdatasciencecentre.org/) received approval to access data in NHS England’s SDE service for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (https://bhfdatasciencecentre.org/areas/cvd-covid-uk-covid-impact/) subsequently granted approval to this project to access the data within NHS England’s SDE service for England. The de-identified data used in this study were made available to accredited researchers. Those wishing to gain access to the data should contact bhfdsc@hdruk.ac.uk in the first instance.</notes><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK), funded co-development (with NHS England) of the Secure Data Environment service for England, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. The authors acknowledge English language editing by Dr Jennifer A de Beyer and Amelia M Doran, Centre for Statistics in Medicine, University of Oxford. This work was carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678). This study made use of de-identified data held in NHS England’s Secure Data Environment service for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work used data provided by patients and collected by the NHS as part of their care and support. We would like to acknowledge all data providers who make health relevant data available for research. This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058). This work was also supported by The Alan Turing Institute via ‘Towards Turing 2.0’ EPSRC Grant Funding. The funders had no role in the study design, data collection, data analysis, data interpretation, or report writing.</funders><projectreference/><lastEdited>2024-03-26T12:01:29.8824949</lastEdited><Created>2024-02-23T11:31:44.4872770</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Health Data Science</level></path><authors><author><firstname>Marta</firstname><surname>Pineda-Moncusí</surname><orcid>0000-0003-0567-0137</orcid><order>1</order></author><author><firstname>Freya</firstname><surname>Allery</surname><order>2</order></author><author><firstname>Antonella</firstname><surname>Delmestri</surname><orcid>0000-0003-0388-3403</orcid><order>3</order></author><author><firstname>Thomas</firstname><surname>Bolton</surname><order>4</order></author><author><firstname>John</firstname><surname>Nolan</surname><order>5</order></author><author><firstname>Johan H.</firstname><surname>Thygesen</surname><orcid>0000-0002-7479-3459</orcid><order>6</order></author><author><firstname>Alex</firstname><surname>Handy</surname><order>7</order></author><author><firstname>Amitava</firstname><surname>Banerjee</surname><order>8</order></author><author><firstname>Spiros</firstname><surname>Denaxas</surname><order>9</order></author><author><firstname>Christopher</firstname><surname>Tomlinson</surname><orcid>0000-0002-0903-5395</orcid><order>10</order></author><author><firstname>Alastair K.</firstname><surname>Denniston</surname><order>11</order></author><author><firstname>Cathie</firstname><surname>Sudlow</surname><order>12</order></author><author><firstname>Ashley</firstname><surname>Akbari</surname><orcid>0000-0003-0814-0801</orcid><order>13</order></author><author><firstname>Angela</firstname><surname>Wood</surname><order>14</order></author><author><firstname>Gary S.</firstname><surname>Collins</surname><orcid>0000-0002-2772-2316</orcid><order>15</order></author><author><firstname>Irene</firstname><surname>Petersen</surname><order>16</order></author><author><firstname>Laura C.</firstname><surname>Coates</surname><order>17</order></author><author><firstname>Kamlesh</firstname><surname>Khunti</surname><order>18</order></author><author><firstname>Daniel</firstname><surname>Prieto-sAlhambra</surname><order>19</order></author><author><firstname>Sara</firstname><surname>Khalid</surname><order>20</order></author><author><firstname>(on behalf of the CVD-COVID-UK/COVID-IMPACT</firstname><surname>Consortium)</surname><order>21</order></author></authors><documents><document><filename>65674__29631__a3b5ed1bf38341f5be96a94a6cc557b7.pdf</filename><originalFilename>65674.pdf</originalFilename><uploaded>2024-03-05T10:23:04.2807052</uploaded><type>Output</type><contentLength>2173444</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.</documentNotes><copyrightCorrect>false</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807> |
spelling |
2024-03-26T12:01:29.8824949 v2 65674 2024-02-23 Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity aa1b025ec0243f708bb5eb0a93d6fb52 0000-0003-0814-0801 Ashley Akbari Ashley Akbari true false 2024-02-23 MEDS Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all. Journal Article Scientific Data 11 1 Springer Science and Business Media LLC 2052-4463 22 2 2024 2024-02-22 10.1038/s41597-024-02958-1 Data availability:The data used in this study are available in NHS England’s Secure Data Environment (SDE) service for England (https://digital.nhs.uk/services/secure-data-environment-service). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://bhfdatasciencecentre.org/) received approval to access data in NHS England’s SDE service for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (https://bhfdatasciencecentre.org/areas/cvd-covid-uk-covid-impact/) subsequently granted approval to this project to access the data within NHS England’s SDE service for England. The de-identified data used in this study were made available to accredited researchers. Those wishing to gain access to the data should contact bhfdsc@hdruk.ac.uk in the first instance. COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University The British Heart Foundation Data Science Centre (grant No SP/19/3/34678, awarded to Health Data Research (HDR) UK), funded co-development (with NHS England) of the Secure Data Environment service for England, provision of linked datasets, data access, user software licences, computational usage, and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser’s National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists, and clinicians. The authors acknowledge English language editing by Dr Jennifer A de Beyer and Amelia M Doran, Centre for Statistics in Medicine, University of Oxford. This work was carried out with the support of the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678). This study made use of de-identified data held in NHS England’s Secure Data Environment service for England and made available via the BHF Data Science Centre’s CVD-COVID-UK/COVID-IMPACT consortium. This work used data provided by patients and collected by the NHS as part of their care and support. We would like to acknowledge all data providers who make health relevant data available for research. This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20058). This work was also supported by The Alan Turing Institute via ‘Towards Turing 2.0’ EPSRC Grant Funding. The funders had no role in the study design, data collection, data analysis, data interpretation, or report writing. 2024-03-26T12:01:29.8824949 2024-02-23T11:31:44.4872770 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Health Data Science Marta Pineda-Moncusí 0000-0003-0567-0137 1 Freya Allery 2 Antonella Delmestri 0000-0003-0388-3403 3 Thomas Bolton 4 John Nolan 5 Johan H. Thygesen 0000-0002-7479-3459 6 Alex Handy 7 Amitava Banerjee 8 Spiros Denaxas 9 Christopher Tomlinson 0000-0002-0903-5395 10 Alastair K. Denniston 11 Cathie Sudlow 12 Ashley Akbari 0000-0003-0814-0801 13 Angela Wood 14 Gary S. Collins 0000-0002-2772-2316 15 Irene Petersen 16 Laura C. Coates 17 Kamlesh Khunti 18 Daniel Prieto-sAlhambra 19 Sara Khalid 20 (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium) 21 65674__29631__a3b5ed1bf38341f5be96a94a6cc557b7.pdf 65674.pdf 2024-03-05T10:23:04.2807052 Output 2173444 application/pdf Version of Record true This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. false eng https://creativecommons.org/licenses/by/4.0/ |
title |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
spellingShingle |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity Ashley Akbari |
title_short |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
title_full |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
title_fullStr |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
title_full_unstemmed |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
title_sort |
Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity |
author_id_str_mv |
aa1b025ec0243f708bb5eb0a93d6fb52 |
author_id_fullname_str_mv |
aa1b025ec0243f708bb5eb0a93d6fb52_***_Ashley Akbari |
author |
Ashley Akbari |
author2 |
Marta Pineda-Moncusí Freya Allery Antonella Delmestri Thomas Bolton John Nolan Johan H. Thygesen Alex Handy Amitava Banerjee Spiros Denaxas Christopher Tomlinson Alastair K. Denniston Cathie Sudlow Ashley Akbari Angela Wood Gary S. Collins Irene Petersen Laura C. Coates Kamlesh Khunti Daniel Prieto-sAlhambra Sara Khalid (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium) |
format |
Journal article |
container_title |
Scientific Data |
container_volume |
11 |
container_issue |
1 |
publishDate |
2024 |
institution |
Swansea University |
issn |
2052-4463 |
doi_str_mv |
10.1038/s41597-024-02958-1 |
publisher |
Springer Science and Business Media LLC |
college_str |
Faculty of Medicine, Health and Life Sciences |
hierarchytype |
|
hierarchy_top_id |
facultyofmedicinehealthandlifesciences |
hierarchy_top_title |
Faculty of Medicine, Health and Life Sciences |
hierarchy_parent_id |
facultyofmedicinehealthandlifesciences |
hierarchy_parent_title |
Faculty of Medicine, Health and Life Sciences |
department_str |
Swansea University Medical School - Health Data Science{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Health Data Science |
document_store_str |
1 |
active_str |
0 |
description |
Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond “White”, “Black”, “Asian”, “Mixed” and “Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all. |
published_date |
2024-02-22T08:28:18Z |
_version_ |
1821393392067674112 |
score |
11.04748 |