No Cover Image

Journal article 249 views 20 downloads

From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design

Sean Walton Orcid Logo, Ben Evans Orcid Logo, Alma Rahat Orcid Logo, James Stovold Orcid Logo, Jakub Vincalek Orcid Logo

ACM Transactions on Interactive Intelligent Systems

Swansea University Authors: Sean Walton Orcid Logo, Ben Evans Orcid Logo, Alma Rahat Orcid Logo

  • 59815814_File000001_1461343780.pdf

    PDF | Accepted Manuscript

    Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).

    Download (10.32MB)

Check full text

DOI (Published version): 10.1145/3773292

Abstract

As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems,...

Full description

Published in: ACM Transactions on Interactive Intelligent Systems
ISSN: 2160-6455 2160-6463
Published: Association for Computing Machinery (ACM) 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa70750
first_indexed 2025-10-22T07:33:29Z
last_indexed 2025-11-22T05:31:57Z
id cronfa70750
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-11-21T14:02:26.0785951</datestamp><bib-version>v2</bib-version><id>70750</id><entry>2025-10-22</entry><title>From Metrics to Meaning: Time to Rethink Evaluation in Human&#x2013;AI Collaborative Design</title><swanseaauthors><author><sid>0ec10d5e3ed3720a2d578417a894cf49</sid><ORCID>0000-0002-6451-265X</ORCID><firstname>Sean</firstname><surname>Walton</surname><name>Sean Walton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>3d273fecc8121fe6b53b8fe5281b9c97</sid><ORCID>0000-0003-3662-9583</ORCID><firstname>Ben</firstname><surname>Evans</surname><name>Ben Evans</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><ORCID>0000-0002-5023-1371</ORCID><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-22</date><deptcode>MACS</deptcode><abstract>As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.</abstract><type>Journal Article</type><journal>ACM Transactions on Interactive Intelligent Systems</journal><volume>0</volume><journalNumber/><paginationStart/><paginationEnd/><publisher>Association for Computing Machinery (ACM)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2160-6455</issnPrint><issnElectronic>2160-6463</issnElectronic><keywords/><publishedDay>27</publishedDay><publishedMonth>10</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-10-27</publishedDate><doi>10.1145/3773292</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Other</apcterm><funders>EPSRC (EP/S021892/1)</funders><projectreference/><lastEdited>2025-11-21T14:02:26.0785951</lastEdited><Created>2025-10-22T08:27:48.8561375</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Sean</firstname><surname>Walton</surname><orcid>0000-0002-6451-265X</orcid><order>1</order></author><author><firstname>Ben</firstname><surname>Evans</surname><orcid>0000-0003-3662-9583</orcid><order>2</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><orcid>0000-0002-5023-1371</orcid><order>3</order></author><author><firstname>James</firstname><surname>Stovold</surname><orcid>0000-0002-0708-2630</orcid><order>4</order></author><author><firstname>Jakub</firstname><surname>Vincalek</surname><orcid>0000-0002-5238-7126</orcid><order>5</order></author></authors><documents><document><filename>70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf</filename><originalFilename>59815814_File000001_1461343780.pdf</originalFilename><uploaded>2025-10-22T08:33:07.0724641</uploaded><type>Output</type><contentLength>10818635</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807>
spelling 2025-11-21T14:02:26.0785951 v2 70750 2025-10-22 From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design 0ec10d5e3ed3720a2d578417a894cf49 0000-0002-6451-265X Sean Walton Sean Walton true false 3d273fecc8121fe6b53b8fe5281b9c97 0000-0003-3662-9583 Ben Evans Ben Evans true false 6206f027aca1e3a5ff6b8cd224248bc2 0000-0002-5023-1371 Alma Rahat Alma Rahat true false 2025-10-22 MACS As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. Journal Article ACM Transactions on Interactive Intelligent Systems 0 Association for Computing Machinery (ACM) 2160-6455 2160-6463 27 10 2025 2025-10-27 10.1145/3773292 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Other EPSRC (EP/S021892/1) 2025-11-21T14:02:26.0785951 2025-10-22T08:27:48.8561375 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Sean Walton 0000-0002-6451-265X 1 Ben Evans 0000-0003-3662-9583 2 Alma Rahat 0000-0002-5023-1371 3 James Stovold 0000-0002-0708-2630 4 Jakub Vincalek 0000-0002-5238-7126 5 70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf 59815814_File000001_1461343780.pdf 2025-10-22T08:33:07.0724641 Output 10818635 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en
title From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
spellingShingle From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
Sean Walton
Ben Evans
Alma Rahat
title_short From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_full From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_fullStr From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_full_unstemmed From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
title_sort From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
author_id_str_mv 0ec10d5e3ed3720a2d578417a894cf49
3d273fecc8121fe6b53b8fe5281b9c97
6206f027aca1e3a5ff6b8cd224248bc2
author_id_fullname_str_mv 0ec10d5e3ed3720a2d578417a894cf49_***_Sean Walton
3d273fecc8121fe6b53b8fe5281b9c97_***_Ben Evans
6206f027aca1e3a5ff6b8cd224248bc2_***_Alma Rahat
author Sean Walton
Ben Evans
Alma Rahat
author2 Sean Walton
Ben Evans
Alma Rahat
James Stovold
Jakub Vincalek
format Journal article
container_title ACM Transactions on Interactive Intelligent Systems
container_volume 0
publishDate 2025
institution Swansea University
issn 2160-6455
2160-6463
doi_str_mv 10.1145/3773292
publisher Association for Computing Machinery (ACM)
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.
published_date 2025-10-27T05:31:36Z
_version_ 1851098080097599488
score 11.089572