Conference Paper/Proceeding/Abstract 323 views
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs
Interspeech 2021
Swansea University Author: Julian Hough
Full text not available from this repository: check for access using links below.
DOI (Published version): 10.21437/interspeech.2021-1633
Abstract
We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model,...
Published in: | Interspeech 2021 |
---|---|
Published: |
ISCA
ISCA
2021
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa64932 |
first_indexed |
2023-11-07T22:06:25Z |
---|---|
last_indexed |
2024-11-25T14:15:01Z |
id |
cronfa64932 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2024-07-11T14:15:59.7177201</datestamp><bib-version>v2</bib-version><id>64932</id><entry>2023-11-07</entry><title>Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs</title><swanseaauthors><author><sid>082d773ae261d2bbf49434dd2608ab40</sid><ORCID>0000-0002-4345-6759</ORCID><firstname>Julian</firstname><surname>Hough</surname><name>Julian Hough</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-11-07</date><deptcode>MACS</deptcode><abstract>We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>Interspeech 2021</journal><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher>ISCA</publisher><placeOfPublication>ISCA</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>30</publishedDay><publishedMonth>8</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-08-30</publishedDate><doi>10.21437/interspeech.2021-1633</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders/><projectreference/><lastEdited>2024-07-11T14:15:59.7177201</lastEdited><Created>2023-11-07T22:02:32.7357830</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Morteza</firstname><surname>Rohanian</surname><order>1</order></author><author><firstname>Julian</firstname><surname>Hough</surname><orcid>0000-0002-4345-6759</orcid><order>2</order></author><author><firstname>Matthew</firstname><surname>Purver</surname><order>3</order></author></authors><documents/><OutputDurs/></rfc1807> |
spelling |
2024-07-11T14:15:59.7177201 v2 64932 2023-11-07 Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs 082d773ae261d2bbf49434dd2608ab40 0000-0002-4345-6759 Julian Hough Julian Hough true false 2023-11-07 MACS We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses. Conference Paper/Proceeding/Abstract Interspeech 2021 ISCA ISCA 30 8 2021 2021-08-30 10.21437/interspeech.2021-1633 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required 2024-07-11T14:15:59.7177201 2023-11-07T22:02:32.7357830 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Morteza Rohanian 1 Julian Hough 0000-0002-4345-6759 2 Matthew Purver 3 |
title |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
spellingShingle |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs Julian Hough |
title_short |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
title_full |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
title_fullStr |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
title_full_unstemmed |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
title_sort |
Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs |
author_id_str_mv |
082d773ae261d2bbf49434dd2608ab40 |
author_id_fullname_str_mv |
082d773ae261d2bbf49434dd2608ab40_***_Julian Hough |
author |
Julian Hough |
author2 |
Morteza Rohanian Julian Hough Matthew Purver |
format |
Conference Paper/Proceeding/Abstract |
container_title |
Interspeech 2021 |
publishDate |
2021 |
institution |
Swansea University |
doi_str_mv |
10.21437/interspeech.2021-1633 |
publisher |
ISCA |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science |
document_store_str |
0 |
active_str |
0 |
description |
We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses. |
published_date |
2021-08-30T20:26:21Z |
_version_ |
1821347971232432128 |
score |
11.04748 |