No Cover Image

E-Thesis 705 views 159 downloads

Quantifying Underspecification in Machine Learning with Explainable AI / James Hinns

Swansea University Author: James Hinns

Abstract

To evaluate a trained machine learning (ML) model’s performance, it is general practice to test its performance by predicting targets from a held-out testing set. For such a dataset, various models can be constructed with different reasoning that produce near-optimal test performance. However, due t...

Full description

Published: Swansea 2022
Institution: Swansea University
Degree level: Master of Research
Degree name: MSc by Research
Supervisor: Roggenbach, Markus; Fan, Xiuyi
URI: https://cronfa.swan.ac.uk/Record/cronfa61751
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract: To evaluate a trained machine learning (ML) model’s performance, it is general practice to test its performance by predicting targets from a held-out testing set. For such a dataset, various models can be constructed with different reasoning that produce near-optimal test performance. However, due to this variance in reasoning some models can generalise, whilst some perform unexpectedly on further unseen data. The existence of multiple equally performing models exhibits underspecification of the ML pipeline used for producing such models. Underspecification poses challenges towards the credibility of such test performance evaluations and has been identified as a key reason why many models that perform well in testing, exhibit poor performance in deployment.In this work, we propose identifying underspecification by estimating the variance of reasoning within a set of near-optimal models produced by a pipeline, also called a Rashomon set. We iteratively train models using the same pipeline to produce an em-pirical Rashomon set of a fixed size. In order to quantify the variation of models within this Rashomon set, we measure the variation of SHapley Additive exPlanations that the models produce using a variety of metrics. This provides us with an index representing the variation of reasoning within this Rashomon set, and thus the pipeline. This index therefore represents the extent of underspecification the pipeline exhibits.We provide an implementation for this approach, and make it publicly available on github. We validate that this implementation shows the trends we expect using evaluation techniques previously used to prove the existence of underspecification. Fur-thermore, we demonstrate our approach on multiple datasets drawn from the literature, and in a COVID-19 virus transmission case study.
Item Description: ORCiD identifier: https://orcid.org/0000-0002-4144-5757
Keywords: Underspecification, Explainable AI
College: Faculty of Science and Engineering