Journal article 249 views 20 downloads
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design
ACM Transactions on Interactive Intelligent Systems
Swansea University Authors:
Sean Walton , Ben Evans
, Alma Rahat
-
PDF | Accepted Manuscript
Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (10.32MB)
DOI (Published version): 10.1145/3773292
Abstract
As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems,...
| Published in: | ACM Transactions on Interactive Intelligent Systems |
|---|---|
| ISSN: | 2160-6455 2160-6463 |
| Published: |
Association for Computing Machinery (ACM)
2025
|
| Online Access: |
Check full text
|
| URI: | https://cronfa.swan.ac.uk/Record/cronfa70750 |
| first_indexed |
2025-10-22T07:33:29Z |
|---|---|
| last_indexed |
2025-11-22T05:31:57Z |
| id |
cronfa70750 |
| recordtype |
SURis |
| fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2025-11-21T14:02:26.0785951</datestamp><bib-version>v2</bib-version><id>70750</id><entry>2025-10-22</entry><title>From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design</title><swanseaauthors><author><sid>0ec10d5e3ed3720a2d578417a894cf49</sid><ORCID>0000-0002-6451-265X</ORCID><firstname>Sean</firstname><surname>Walton</surname><name>Sean Walton</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>3d273fecc8121fe6b53b8fe5281b9c97</sid><ORCID>0000-0003-3662-9583</ORCID><firstname>Ben</firstname><surname>Evans</surname><name>Ben Evans</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>6206f027aca1e3a5ff6b8cd224248bc2</sid><ORCID>0000-0002-5023-1371</ORCID><firstname>Alma</firstname><surname>Rahat</surname><name>Alma Rahat</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-10-22</date><deptcode>MACS</deptcode><abstract>As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool.</abstract><type>Journal Article</type><journal>ACM Transactions on Interactive Intelligent Systems</journal><volume>0</volume><journalNumber/><paginationStart/><paginationEnd/><publisher>Association for Computing Machinery (ACM)</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>2160-6455</issnPrint><issnElectronic>2160-6463</issnElectronic><keywords/><publishedDay>27</publishedDay><publishedMonth>10</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-10-27</publishedDate><doi>10.1145/3773292</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Other</apcterm><funders>EPSRC (EP/S021892/1)</funders><projectreference/><lastEdited>2025-11-21T14:02:26.0785951</lastEdited><Created>2025-10-22T08:27:48.8561375</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Sean</firstname><surname>Walton</surname><orcid>0000-0002-6451-265X</orcid><order>1</order></author><author><firstname>Ben</firstname><surname>Evans</surname><orcid>0000-0003-3662-9583</orcid><order>2</order></author><author><firstname>Alma</firstname><surname>Rahat</surname><orcid>0000-0002-5023-1371</orcid><order>3</order></author><author><firstname>James</firstname><surname>Stovold</surname><orcid>0000-0002-0708-2630</orcid><order>4</order></author><author><firstname>Jakub</firstname><surname>Vincalek</surname><orcid>0000-0002-5238-7126</orcid><order>5</order></author></authors><documents><document><filename>70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf</filename><originalFilename>59815814_File000001_1461343780.pdf</originalFilename><uploaded>2025-10-22T08:33:07.0724641</uploaded><type>Output</type><contentLength>10818635</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807> |
| spelling |
2025-11-21T14:02:26.0785951 v2 70750 2025-10-22 From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design 0ec10d5e3ed3720a2d578417a894cf49 0000-0002-6451-265X Sean Walton Sean Walton true false 3d273fecc8121fe6b53b8fe5281b9c97 0000-0003-3662-9583 Ben Evans Ben Evans true false 6206f027aca1e3a5ff6b8cd224248bc2 0000-0002-5023-1371 Alma Rahat Alma Rahat true false 2025-10-22 MACS As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. Journal Article ACM Transactions on Interactive Intelligent Systems 0 Association for Computing Machinery (ACM) 2160-6455 2160-6463 27 10 2025 2025-10-27 10.1145/3773292 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Other EPSRC (EP/S021892/1) 2025-11-21T14:02:26.0785951 2025-10-22T08:27:48.8561375 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Sean Walton 0000-0002-6451-265X 1 Ben Evans 0000-0003-3662-9583 2 Alma Rahat 0000-0002-5023-1371 3 James Stovold 0000-0002-0708-2630 4 Jakub Vincalek 0000-0002-5238-7126 5 70750__35434__499b1d12f3d346e78fcb0edc255ed70e.pdf 59815814_File000001_1461343780.pdf 2025-10-22T08:33:07.0724641 Output 10818635 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en |
| title |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| spellingShingle |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design Sean Walton Ben Evans Alma Rahat |
| title_short |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_full |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_fullStr |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_full_unstemmed |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| title_sort |
From Metrics to Meaning: Time to Rethink Evaluation in Human–AI Collaborative Design |
| author_id_str_mv |
0ec10d5e3ed3720a2d578417a894cf49 3d273fecc8121fe6b53b8fe5281b9c97 6206f027aca1e3a5ff6b8cd224248bc2 |
| author_id_fullname_str_mv |
0ec10d5e3ed3720a2d578417a894cf49_***_Sean Walton 3d273fecc8121fe6b53b8fe5281b9c97_***_Ben Evans 6206f027aca1e3a5ff6b8cd224248bc2_***_Alma Rahat |
| author |
Sean Walton Ben Evans Alma Rahat |
| author2 |
Sean Walton Ben Evans Alma Rahat James Stovold Jakub Vincalek |
| format |
Journal article |
| container_title |
ACM Transactions on Interactive Intelligent Systems |
| container_volume |
0 |
| publishDate |
2025 |
| institution |
Swansea University |
| issn |
2160-6455 2160-6463 |
| doi_str_mv |
10.1145/3773292 |
| publisher |
Association for Computing Machinery (ACM) |
| college_str |
Faculty of Science and Engineering |
| hierarchytype |
|
| hierarchy_top_id |
facultyofscienceandengineering |
| hierarchy_top_title |
Faculty of Science and Engineering |
| hierarchy_parent_id |
facultyofscienceandengineering |
| hierarchy_parent_title |
Faculty of Science and Engineering |
| department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
| document_store_str |
1 |
| active_str |
0 |
| description |
As AI systems increasingly shape decision making in creative design contexts, understanding how humans engage with these tools has become a critical challenge for interactive intelligent systems research. This paper contributes a challenge to rethink how to evaluate human--AI collaborative systems, advocating for a more nuanced and multidimensional approach. Findings from one of the largest field studies to date (n = 808) of a human--AI co-creative system, The Genetic Car Designer, complemented by a controlled lab study (n = 12) are presented. The system is based on an interactive evolutionary algorithm where participants were tasked with designing a simple two dimensional representation of a car. Participants were exposed to galleries of design suggestions generated by an intelligent system, MAP--Elites, and a random control. Results indicate that exposure to galleries generated by MAP--Elites significantly enhanced both cognitive and behavioural engagement, leading to higher-quality design outcomes. Crucially for the wider community, the analysis reveals that conventional evaluation methods, which often focus on solely behavioural and design quality metrics, fail to capture the full spectrum of user engagement. By considering the human--AI design process as a changing emotional, behavioural and cognitive state of the designer, we propose evaluating human--AI systems holistically and considering intelligent systems as a core part of the user experience---not simply a back end tool. |
| published_date |
2025-10-27T05:31:36Z |
| _version_ |
1851098080097599488 |
| score |
11.089572 |

