No Cover Image

Journal article 103 views 45 downloads

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Phil Newton Orcid Logo, Maria Xiromeriti

Assessment and Evaluation in Higher Education, Pages: 1 - 18

Swansea University Authors: Phil Newton Orcid Logo, Maria Xiromeriti

  • 65458_VoR.pdf

    PDF | Version of Record

    © 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License

    Download (2.96MB)

Abstract

Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance...

Full description

Published in: Assessment and Evaluation in Higher Education
ISSN: 0260-2938 1469-297X
Published: Informa UK Limited 2024
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa65458
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2024-01-18T09:59:03Z
last_indexed 2024-01-18T09:59:03Z
id cronfa65458
recordtype SURis
fullrecord <?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>65458</id><entry>2024-01-18</entry><title>ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review</title><swanseaauthors><author><sid>6e0a363d04c407371184d82f7a5bddc8</sid><ORCID>0000-0002-5272-7979</ORCID><firstname>Phil</firstname><surname>Newton</surname><name>Phil Newton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>cca9e23c0afcde81658d9223c57583ec</sid><firstname>Maria</firstname><surname>Xiromeriti</surname><name>Maria Xiromeriti</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-01-18</date><deptcode>MEDS</deptcode><abstract>Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.</abstract><type>Journal Article</type><journal>Assessment and Evaluation in Higher Education</journal><volume>0</volume><journalNumber/><paginationStart>1</paginationStart><paginationEnd>18</paginationEnd><publisher>Informa UK Limited</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0260-2938</issnPrint><issnElectronic>1469-297X</issnElectronic><keywords>Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism</keywords><publishedDay>17</publishedDay><publishedMonth>1</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-01-17</publishedDate><doi>10.1080/02602938.2023.2299059</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2024-05-31T17:18:13.5144224</lastEdited><Created>2024-01-18T09:56:09.9377750</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Medicine</level></path><authors><author><firstname>Phil</firstname><surname>Newton</surname><orcid>0000-0002-5272-7979</orcid><order>1</order></author><author><firstname>Maria</firstname><surname>Xiromeriti</surname><order>2</order></author></authors><documents><document><filename>65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf</filename><originalFilename>65458_VoR.pdf</originalFilename><uploaded>2024-03-20T17:53:56.1968436</uploaded><type>Output</type><contentLength>3105434</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by-nc-nd/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling v2 65458 2024-01-18 ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review 6e0a363d04c407371184d82f7a5bddc8 0000-0002-5272-7979 Phil Newton Phil Newton true false cca9e23c0afcde81658d9223c57583ec Maria Xiromeriti Maria Xiromeriti true false 2024-01-18 MEDS Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge. Journal Article Assessment and Evaluation in Higher Education 0 1 18 Informa UK Limited 0260-2938 1469-297X Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism 17 1 2024 2024-01-17 10.1080/02602938.2023.2299059 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University SU Library paid the OA fee (TA Institutional Deal) Swansea University 2024-05-31T17:18:13.5144224 2024-01-18T09:56:09.9377750 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Medicine Phil Newton 0000-0002-5272-7979 1 Maria Xiromeriti 2 65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf 65458_VoR.pdf 2024-03-20T17:53:56.1968436 Output 3105434 application/pdf Version of Record true © 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License true eng http://creativecommons.org/licenses/by-nc-nd/4.0/
title ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
spellingShingle ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
Phil Newton
Maria Xiromeriti
title_short ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_full ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_fullStr ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_full_unstemmed ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_sort ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
author_id_str_mv 6e0a363d04c407371184d82f7a5bddc8
cca9e23c0afcde81658d9223c57583ec
author_id_fullname_str_mv 6e0a363d04c407371184d82f7a5bddc8_***_Phil Newton
cca9e23c0afcde81658d9223c57583ec_***_Maria Xiromeriti
author Phil Newton
Maria Xiromeriti
author2 Phil Newton
Maria Xiromeriti
format Journal article
container_title Assessment and Evaluation in Higher Education
container_volume 0
container_start_page 1
publishDate 2024
institution Swansea University
issn 0260-2938
1469-297X
doi_str_mv 10.1080/02602938.2023.2299059
publisher Informa UK Limited
college_str Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id facultyofmedicinehealthandlifesciences
hierarchy_top_title Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id facultyofmedicinehealthandlifesciences
hierarchy_parent_title Faculty of Medicine, Health and Life Sciences
department_str Swansea University Medical School - Medicine{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Medicine
document_store_str 1
active_str 0
description Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.
published_date 2024-01-17T17:18:11Z
_version_ 1800585653255143424
score 11.012924