ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Newton, Phil; Xiromeriti, Maria

doi:10.1080/02602938.2023.2299059

Journal article 466 views 252 downloads

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Phil Newton

, Maria Xiromeriti

Assessment and Evaluation in Higher Education, Volume: 49, Issue: 6, Pages: 781 - 798

Swansea University Authors: Phil Newton , Maria Xiromeriti

PDF | Version of Record

© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
Download (2.96MB)

Check full text

DOI (Published version): 10.1080/02602938.2023.2299059

Abstract

Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance...

Full description

Published in:	Assessment and Evaluation in Higher Education
ISSN:	0260-2938 1469-297X
Published:	Informa UK Limited 2024
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa65458

first_indexed	2024-01-18T09:59:03Z
last_indexed	2024-11-25T14:16:07Z
id	cronfa65458
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-10-17T15:47:36.4070953</datestamp><bib-version>v2</bib-version><id>65458</id><entry>2024-01-18</entry><title>ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review</title><swanseaauthors><author><sid>6e0a363d04c407371184d82f7a5bddc8</sid><ORCID>0000-0002-5272-7979</ORCID><firstname>Phil</firstname><surname>Newton</surname><name>Phil Newton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>cca9e23c0afcde81658d9223c57583ec</sid><firstname>Maria</firstname><surname>Xiromeriti</surname><name>Maria Xiromeriti</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-01-18</date><deptcode>MEDS</deptcode><abstract>Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.</abstract><type>Journal Article</type><journal>Assessment and Evaluation in Higher Education</journal><volume>49</volume><journalNumber>6</journalNumber><paginationStart>781</paginationStart><paginationEnd>798</paginationEnd><publisher>Informa UK Limited</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0260-2938</issnPrint><issnElectronic>1469-297X</issnElectronic><keywords>Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism</keywords><publishedDay>17</publishedDay><publishedMonth>1</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-01-17</publishedDate><doi>10.1080/02602938.2023.2299059</doi><url/><notes/><college>COLLEGE NANME</college><department>Medical School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MEDS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>Swansea University</funders><projectreference/><lastEdited>2024-10-17T15:47:36.4070953</lastEdited><Created>2024-01-18T09:56:09.9377750</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">Swansea University Medical School - Medicine</level></path><authors><author><firstname>Phil</firstname><surname>Newton</surname><orcid>0000-0002-5272-7979</orcid><order>1</order></author><author><firstname>Maria</firstname><surname>Xiromeriti</surname><order>2</order></author></authors><documents><document><filename>65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf</filename><originalFilename>65458_VoR.pdf</originalFilename><uploaded>2024-03-20T17:53:56.1968436</uploaded><type>Output</type><contentLength>3105434</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by-nc-nd/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2024-10-17T15:47:36.4070953 v2 65458 2024-01-18 ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review 6e0a363d04c407371184d82f7a5bddc8 0000-0002-5272-7979 Phil Newton Phil Newton true false cca9e23c0afcde81658d9223c57583ec Maria Xiromeriti Maria Xiromeriti true false 2024-01-18 MEDS Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge. Journal Article Assessment and Evaluation in Higher Education 49 6 781 798 Informa UK Limited 0260-2938 1469-297X Artificial intelligence; academic integrity; cheating; evidence-based education; MCQs; pragmatism 17 1 2024 2024-01-17 10.1080/02602938.2023.2299059 COLLEGE NANME Medical School COLLEGE CODE MEDS Swansea University SU Library paid the OA fee (TA Institutional Deal) Swansea University 2024-10-17T15:47:36.4070953 2024-01-18T09:56:09.9377750 Faculty of Medicine, Health and Life Sciences Swansea University Medical School - Medicine Phil Newton 0000-0002-5272-7979 1 Maria Xiromeriti 2 65458__29770__59a3007cfb3a4d5cad19a50811fb7487.pdf 65458_VoR.pdf 2024-03-20T17:53:56.1968436 Output 3105434 application/pdf Version of Record true © 2024 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License true eng http://creativecommons.org/licenses/by-nc-nd/4.0/
title	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
spellingShingle	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review Phil Newton Maria Xiromeriti
title_short	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_full	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_fullStr	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_full_unstemmed	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
title_sort	ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review
author_id_str_mv	6e0a363d04c407371184d82f7a5bddc8 cca9e23c0afcde81658d9223c57583ec
author_id_fullname_str_mv	6e0a363d04c407371184d82f7a5bddc8_*_Phil Newton cca9e23c0afcde81658d9223c57583ec_*_Maria Xiromeriti
author	Phil Newton Maria Xiromeriti
author2	Phil Newton Maria Xiromeriti
format	Journal article
container_title	Assessment and Evaluation in Higher Education
container_volume	49
container_issue	6
container_start_page	781
publishDate	2024
institution	Swansea University
issn	0260-2938 1469-297X
doi_str_mv	10.1080/02602938.2023.2299059
publisher	Informa UK Limited
college_str	Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id	facultyofmedicinehealthandlifesciences
hierarchy_top_title	Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id	facultyofmedicinehealthandlifesciences
hierarchy_parent_title	Faculty of Medicine, Health and Life Sciences
department_str	Swansea University Medical School - Medicine{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}Swansea University Medical School - Medicine
document_store_str	1
active_str	0
description	Media coverage suggests that ChatGPT can pass examinations based on multiple choice questions (MCQs), including those used to qualify doctors, lawyers, scientists etc. This poses a potential risk to the integrity of those examinations. We reviewed current research evidence regarding the performance of ChatGPT on MCQ-based examinations in higher education, along with recommendations for how educators might address challenges and benefits arising from these data. 53 studies were included, covering 114 question sets, totalling 49014 MCQs. Free versions of ChatGPT based upon GPT-3/3.5 performed better than random guessing but failed most examinations, performing significantly worse than the average human student. GPT-4 passed most examinations with a performance that was on a par with human subjects. These findings indicate that all summative MCQ-based assessments should be conducted under secure conditions with restricted access to ChatGPT and similar tools, particularly those examinations which assess foundational knowledge.
published_date	2024-01-17T11:17:02Z
_version_	1826568035266199552
score	11.054705

ChatGPT performance on multiple choice question examinations in higher education. A pragmatic scoping review

Similar Items