Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment

Gray, Andy; Rahat, Alma A.M; Crick, Tom; Lindsay, Stephen; Wallace, Darren; GRAY, ANDREW; Rahat, Alma; Crick, Tom

doi:10.1145/3551708.3556204

Conference Paper/Proceeding/Abstract 806 views

Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment

Andy Gray, Alma A.M Rahat, Tom Crick, Stephen Lindsay, Darren Wallace, ANDREW GRAY, Alma Rahat

, Tom Crick

Proceedings of the 6th International Conference on Education and Multimedia Technology, Volume: ICEMT'22, Issue: July 2022, Pages: 272 - 278

Swansea University Authors: ANDREW GRAY, Alma Rahat , Tom Crick

Full text not available from this repository: check for access using links below.

DOI (Published version): 10.1145/3551708.3556204

Abstract

Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also cre...

Full description

Published in:	Proceedings of the 6th International Conference on Education and Multimedia Technology
ISBN:	978-1-4503-9645-5/22/07
Published:	New York, NY, USA ACM 2022
Online Access:	http://dx.doi.org/10.1145/3551708.3556204
URI:	https://cronfa.swan.ac.uk/Record/cronfa60788
Tags:	Add Tag No Tags, Be the first to tag this record!

first_indexed	2022-08-23T11:51:54Z
last_indexed	2023-01-13T19:21:10Z
id	cronfa60788
recordtype	SURis
fullrecord	<?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>60788</id><entry>2022-08-10</entry><title>Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment</title><swanseaauthors><author><sid>70200df2c37f5d14360e8660d93f57a8</sid><firstname>ANDREW</firstname><surname>GRAY</surname><name>ANDREW GRAY</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><ORCID>0000-0002-5023-1371</ORCID><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>200c66ef0fc55391f736f6e926fb4b99</sid><ORCID>0000-0001-5196-9389</ORCID><firstname>Tom</firstname><surname>Crick</surname><name>Tom Crick</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-08-10</date><abstract>Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). The key idea here is that the better submission between a pair will be identified by a suitably qualified or experienced assessor. This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected.There is a clear necessity to investigate the efficacy of alternative rating and ranking systems that do not require extensive data on every pair of submissions, to reduce the temporal and cognitive burden on assessors, and bias from observing the same submission repeatedly. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess – for devising a ranking between submissions in a comparative judgement context. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK’s political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall’s tau score of 0.96 and a p-value of 1.5 × 10^−5. We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Proceedings of the 6th International Conference on Education and Multimedia Technology</journal><volume>ICEMT'22</volume><journalNumber>July 2022</journalNumber><paginationStart>272</paginationStart><paginationEnd>278</paginationEnd><publisher>ACM</publisher><placeOfPublication>New York, NY, USA</placeOfPublication><isbnPrint/><isbnElectronic>978-1-4503-9645-5/22/07</isbnElectronic><issnPrint/><issnElectronic/><keywords/><publishedDay>13</publishedDay><publishedMonth>7</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-07-13</publishedDate><doi>10.1145/3551708.3556204</doi><url>http://dx.doi.org/10.1145/3551708.3556204</url><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders/><projectreference/><lastEdited>2023-11-20T15:11:42.6982289</lastEdited><Created>2022-08-10T17:58:41.4310406</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Andy</firstname><surname>Gray</surname><order>1</order></author><author><firstname>Alma A.M</firstname><surname>Rahat</surname><order>2</order></author><author><firstname>Tom</firstname><surname>Crick</surname><order>3</order></author><author><firstname>Stephen</firstname><surname>Lindsay</surname><order>4</order></author><author><firstname>Darren</firstname><surname>Wallace</surname><order>5</order></author><author><firstname>ANDREW</firstname><surname>GRAY</surname><order>6</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><orcid>0000-0002-5023-1371</orcid><order>7</order></author><author><firstname>Tom</firstname><surname>Crick</surname><orcid>0000-0001-5196-9389</orcid><order>8</order></author></authors><documents/><OutputDurs/></rfc1807>
spelling	v2 60788 2022-08-10 Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment 70200df2c37f5d14360e8660d93f57a8 ANDREW GRAY ANDREW GRAY true false 6206f027aca1e3a5ff6b8cd224248bc2 0000-0002-5023-1371 Alma Rahat Alma Rahat true false 200c66ef0fc55391f736f6e926fb4b99 0000-0001-5196-9389 Tom Crick Tom Crick true false 2022-08-10 Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). The key idea here is that the better submission between a pair will be identified by a suitably qualified or experienced assessor. This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected.There is a clear necessity to investigate the efficacy of alternative rating and ranking systems that do not require extensive data on every pair of submissions, to reduce the temporal and cognitive burden on assessors, and bias from observing the same submission repeatedly. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess – for devising a ranking between submissions in a comparative judgement context. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK’s political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall’s tau score of 0.96 and a p-value of 1.5 × 10^−5. We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts. Conference Paper/Proceeding/Abstract Proceedings of the 6th International Conference on Education and Multimedia Technology ICEMT'22 July 2022 272 278 ACM New York, NY, USA 978-1-4503-9645-5/22/07 13 7 2022 2022-07-13 10.1145/3551708.3556204 http://dx.doi.org/10.1145/3551708.3556204 COLLEGE NANME COLLEGE CODE Swansea University Not Required 2023-11-20T15:11:42.6982289 2022-08-10T17:58:41.4310406 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Andy Gray 1 Alma A.M Rahat 2 Tom Crick 3 Stephen Lindsay 4 Darren Wallace 5 ANDREW GRAY 6 Alma Rahat 0000-0002-5023-1371 7 Tom Crick 0000-0001-5196-9389 8
title	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
spellingShingle	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment ANDREW GRAY Alma Rahat Tom Crick
title_short	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
title_full	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
title_fullStr	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
title_full_unstemmed	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
title_sort	Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
author_id_str_mv	70200df2c37f5d14360e8660d93f57a8 6206f027aca1e3a5ff6b8cd224248bc2 200c66ef0fc55391f736f6e926fb4b99
author_id_fullname_str_mv	70200df2c37f5d14360e8660d93f57a8_*_ANDREW GRAY 6206f027aca1e3a5ff6b8cd224248bc2__Alma Rahat 200c66ef0fc55391f736f6e926fb4b99_**_Tom Crick
author	ANDREW GRAY Alma Rahat Tom Crick
author2	Andy Gray Alma A.M Rahat Tom Crick Stephen Lindsay Darren Wallace ANDREW GRAY Alma Rahat Tom Crick
format	Conference Paper/Proceeding/Abstract
container_title	Proceedings of the 6th International Conference on Education and Multimedia Technology
container_volume	ICEMT'22
container_issue	July 2022
container_start_page	272
publishDate	2022
institution	Swansea University
isbn	978-1-4503-9645-5/22/07
doi_str_mv	10.1145/3551708.3556204
publisher	ACM
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
url	http://dx.doi.org/10.1145/3551708.3556204
document_store_str	0
active_str	0
description	Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). The key idea here is that the better submission between a pair will be identified by a suitably qualified or experienced assessor. This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected.There is a clear necessity to investigate the efficacy of alternative rating and ranking systems that do not require extensive data on every pair of submissions, to reduce the temporal and cognitive burden on assessors, and bias from observing the same submission repeatedly. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess – for devising a ranking between submissions in a comparative judgement context. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK’s political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall’s tau score of 0.96 and a p-value of 1.5 × 10^−5. We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts.
published_date	2022-07-13T15:11:43Z
_version_	1783096256674922496
score	11.037144

Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment

Similar Items