Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model

HUMAIDAN, AALAA; Roy, Jeny; Sharifzadeh, Sara; Mehta, Ruchita; Tales, Andrea; MacInnes, Joe

doi:https://doi.org/

Conference Paper/Proceeding/Abstract 80 views 13 downloads

Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model

AALAA HUMAIDAN, Jeny Roy

, Sara Sharifzadeh

, Ruchita Mehta, Andrea Tales

, Joe MacInnes

2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media, Issue: IEEE Symposium Series on Computational Intelligence,

Swansea University Authors: AALAA HUMAIDAN, Jeny Roy , Sara Sharifzadeh , Andrea Tales , Joe MacInnes

PDF | Accepted Manuscript

Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (310.58KB)

Published in:	2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media
Published:	IEEE
URI:	https://cronfa.swan.ac.uk/Record/cronfa69135

first_indexed	2025-03-22T16:01:52Z
last_indexed	2025-04-29T04:40:41Z
id	cronfa69135
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2025-04-28T12:03:34.3249478</datestamp><bib-version>v2</bib-version><id>69135</id><entry>2025-03-22</entry><title>Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model</title><swanseaauthors><author><sid>9d418b788707447f2ad49125fa3867cf</sid><firstname>AALAA</firstname><surname>HUMAIDAN</surname><name>AALAA HUMAIDAN</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>00fd1a17e776f2e1d532ab4125995465</sid><ORCID>0009-0006-3354-7557</ORCID><firstname>Jeny</firstname><surname>Roy</surname><name>Jeny Roy</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>a4e15f304398ecee3f28c7faec69c1b0</sid><ORCID>0000-0003-4621-2917</ORCID><firstname>Sara</firstname><surname>Sharifzadeh</surname><name>Sara Sharifzadeh</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>9b53a866ddacb566c38ee336706aef5f</sid><ORCID>0000-0003-4825-4555</ORCID><firstname>Andrea</firstname><surname>Tales</surname><name>Andrea Tales</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>06dcb003ec50192bafde2c77bef4fd5c</sid><ORCID>0000-0002-5134-1601</ORCID><firstname>Joe</firstname><surname>MacInnes</surname><name>Joe MacInnes</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-03-22</date><deptcode>MACS</deptcode><abstract/><type>Conference Paper/Proceeding/Abstract</type><journal>2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media</journal><volume/><journalNumber>IEEE Symposium Series on Computational Intelligence,</journalNumber><paginationStart/><paginationEnd/><publisher>IEEE</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords/><publishedDay>0</publishedDay><publishedMonth>0</publishedMonth><publishedYear>0</publishedYear><publishedDate>0001-01-01</publishedDate><doi/><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Other</apcterm><funders>The project is partly funded by the IAA project and the data set used is based on my previous work at Coventry University and a funded PhD proposal</funders><projectreference/><lastEdited>2025-04-28T12:03:34.3249478</lastEdited><Created>2025-03-22T11:51:08.4451291</Created><path><level id="1">Faculty of Medicine, Health and Life Sciences</level><level id="2">School of Health and Social Care - Public Health</level></path><authors><author><firstname>AALAA</firstname><surname>HUMAIDAN</surname><order>1</order></author><author><firstname>Jeny</firstname><surname>Roy</surname><orcid>0009-0006-3354-7557</orcid><order>2</order></author><author><firstname>Sara</firstname><surname>Sharifzadeh</surname><orcid>0000-0003-4621-2917</orcid><order>3</order></author><author><firstname>Ruchita</firstname><surname>Mehta</surname><order>4</order></author><author><firstname>Andrea</firstname><surname>Tales</surname><orcid>0000-0003-4825-4555</orcid><order>5</order></author><author><firstname>Joe</firstname><surname>MacInnes</surname><orcid>0000-0002-5134-1601</orcid><order>6</order></author></authors><documents><document><filename>69135__33859__8b03d12386bc415886fd571aded0b2ac.pdf</filename><originalFilename>IEEE_Conference_Template_Revised_Submission.pdf</originalFilename><uploaded>2025-03-22T13:13:21.7234283</uploaded><type>Output</type><contentLength>318034</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2025-04-22T00:00:00.0000000</embargoDate><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807>
spelling	2025-04-28T12:03:34.3249478 v2 69135 2025-03-22 Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model 9d418b788707447f2ad49125fa3867cf AALAA HUMAIDAN AALAA HUMAIDAN true false 00fd1a17e776f2e1d532ab4125995465 0009-0006-3354-7557 Jeny Roy Jeny Roy true false a4e15f304398ecee3f28c7faec69c1b0 0000-0003-4621-2917 Sara Sharifzadeh Sara Sharifzadeh true false 9b53a866ddacb566c38ee336706aef5f 0000-0003-4825-4555 Andrea Tales Andrea Tales true false 06dcb003ec50192bafde2c77bef4fd5c 0000-0002-5134-1601 Joe MacInnes Joe MacInnes true false 2025-03-22 MACS Conference Paper/Proceeding/Abstract 2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media IEEE Symposium Series on Computational Intelligence, IEEE 0 0 0 0001-01-01 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Other The project is partly funded by the IAA project and the data set used is based on my previous work at Coventry University and a funded PhD proposal 2025-04-28T12:03:34.3249478 2025-03-22T11:51:08.4451291 Faculty of Medicine, Health and Life Sciences School of Health and Social Care - Public Health AALAA HUMAIDAN 1 Jeny Roy 0009-0006-3354-7557 2 Sara Sharifzadeh 0000-0003-4621-2917 3 Ruchita Mehta 4 Andrea Tales 0000-0003-4825-4555 5 Joe MacInnes 0000-0002-5134-1601 6 69135__33859__8b03d12386bc415886fd571aded0b2ac.pdf IEEE_Conference_Template_Revised_Submission.pdf 2025-03-22T13:13:21.7234283 Output 318034 application/pdf Accepted Manuscript true 2025-04-22T00:00:00.0000000 Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en
title	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
spellingShingle	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model AALAA HUMAIDAN Jeny Roy Sara Sharifzadeh Andrea Tales Joe MacInnes
title_short	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
title_full	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
title_fullStr	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
title_full_unstemmed	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
title_sort	Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model
author_id_str_mv	9d418b788707447f2ad49125fa3867cf 00fd1a17e776f2e1d532ab4125995465 a4e15f304398ecee3f28c7faec69c1b0 9b53a866ddacb566c38ee336706aef5f 06dcb003ec50192bafde2c77bef4fd5c
author_id_fullname_str_mv	9d418b788707447f2ad49125fa3867cf_*_AALAA HUMAIDAN 00fd1a17e776f2e1d532ab4125995465__Jeny Roy a4e15f304398ecee3f28c7faec69c1b0__Sara Sharifzadeh 9b53a866ddacb566c38ee336706aef5f__Andrea Tales 06dcb003ec50192bafde2c77bef4fd5c_**_Joe MacInnes
author	AALAA HUMAIDAN Jeny Roy Sara Sharifzadeh Andrea Tales Joe MacInnes
author2	AALAA HUMAIDAN Jeny Roy Sara Sharifzadeh Ruchita Mehta Andrea Tales Joe MacInnes
format	Conference Paper/Proceeding/Abstract
container_title	2025 IEEE Symposium on Computational Intelligence in Image, Signal Processing and Synthetic Media
container_issue	IEEE Symposium Series on Computational Intelligence,
institution	Swansea University
publisher	IEEE
college_str	Faculty of Medicine, Health and Life Sciences
hierarchytype
hierarchy_top_id	facultyofmedicinehealthandlifesciences
hierarchy_top_title	Faculty of Medicine, Health and Life Sciences
hierarchy_parent_id	facultyofmedicinehealthandlifesciences
hierarchy_parent_title	Faculty of Medicine, Health and Life Sciences
department_str	School of Health and Social Care - Public Health{{{_:::_}}}Faculty of Medicine, Health and Life Sciences{{{_:::_}}}School of Health and Social Care - Public Health
document_store_str	1
active_str	0
published_date	0001-01-01T19:06:29Z
_version_	1831848420316282880
score	11.058631

Exploring Human Activity Recognition with Acoustic Data: A Comparative Study of CNN-LSTM, ViViT, and ResNet-Temporal Transformer Model

Similar Items