Inferring Attention Shifts for Salient Instance Ranking

Siris, Avishek; Jiao, Jianbo; Tam, Gary; Xie, Xianghua; Lau, Rynson

doi:10.1007/s11263-023-01906-7

Journal article 1006 views 111 downloads

Inferring Attention Shifts for Salient Instance Ranking

Avishek Siris, Jianbo Jiao, Gary Tam

, Xianghua Xie

, Rynson Lau

International Journal of Computer Vision, Volume: 132, Issue: 3, Pages: 964 - 986

Swansea University Authors: Avishek Siris, Gary Tam , Xianghua Xie , Rynson Lau

PDF | Version of Record

© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.
Download (4.23MB)

Check full text

DOI (Published version): 10.1007/s11263-023-01906-7

Abstract

The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise a...

Full description

Published in:	International Journal of Computer Vision
ISSN:	0920-5691 1573-1405
Published:	Springer Science and Business Media LLC 2024
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa64287

first_indexed	2023-09-01T09:11:41Z
last_indexed	2024-11-25T14:13:47Z
id	cronfa64287
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-07-11T15:13:54.6975541</datestamp><bib-version>v2</bib-version><id>64287</id><entry>2023-09-01</entry><title>Inferring Attention Shifts for Salient Instance Ranking</title><swanseaauthors><author><sid>896b738a2b485a166c052d94bca5fa68</sid><firstname>Avishek</firstname><surname>Siris</surname><name>Avishek Siris</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>e75a68e11a20e5f1da94ee6e28ff5e76</sid><ORCID>0000-0001-7387-5180</ORCID><firstname>Gary</firstname><surname>Tam</surname><name>Gary Tam</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b334d40963c7a2f435f06d2c26c74e11</sid><ORCID>0000-0002-2701-8660</ORCID><firstname>Xianghua</firstname><surname>Xie</surname><name>Xianghua Xie</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>8d230434b6eadb1be5928241b0beecd0</sid><firstname>Rynson</firstname><surname>Lau</surname><name>Rynson Lau</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-09-01</date><deptcode>MACS</deptcode><abstract>The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets.</abstract><type>Journal Article</type><journal>International Journal of Computer Vision</journal><volume>132</volume><journalNumber>3</journalNumber><paginationStart>964</paginationStart><paginationEnd>986</paginationEnd><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0920-5691</issnPrint><issnElectronic>1573-1405</issnElectronic><keywords>Attention Shift, Saliency, Saliency Ranking, Salient Object Detection.</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-03-01</publishedDate><doi>10.1007/s11263-023-01906-7</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674).</funders><projectreference/><lastEdited>2024-07-11T15:13:54.6975541</lastEdited><Created>2023-09-01T10:01:59.2765646</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Avishek</firstname><surname>Siris</surname><order>1</order></author><author><firstname>Jianbo</firstname><surname>Jiao</surname><order>2</order></author><author><firstname>Gary</firstname><surname>Tam</surname><orcid>0000-0001-7387-5180</orcid><order>3</order></author><author><firstname>Xianghua</firstname><surname>Xie</surname><orcid>0000-0002-2701-8660</orcid><order>4</order></author><author><firstname>Rynson</firstname><surname>Lau</surname><order>5</order></author></authors><documents><document><filename>64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf</filename><originalFilename>64287.VOR.pdf</originalFilename><uploaded>2024-04-16T13:13:37.1189188</uploaded><type>Output</type><contentLength>4434052</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2024-07-11T15:13:54.6975541 v2 64287 2023-09-01 Inferring Attention Shifts for Salient Instance Ranking 896b738a2b485a166c052d94bca5fa68 Avishek Siris Avishek Siris true false e75a68e11a20e5f1da94ee6e28ff5e76 0000-0001-7387-5180 Gary Tam Gary Tam true false b334d40963c7a2f435f06d2c26c74e11 0000-0002-2701-8660 Xianghua Xie Xianghua Xie true false 8d230434b6eadb1be5928241b0beecd0 Rynson Lau Rynson Lau true false 2023-09-01 MACS The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets. Journal Article International Journal of Computer Vision 132 3 964 986 Springer Science and Business Media LLC 0920-5691 1573-1405 Attention Shift, Saliency, Saliency Ranking, Salient Object Detection. 1 3 2024 2024-03-01 10.1007/s11263-023-01906-7 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674). 2024-07-11T15:13:54.6975541 2023-09-01T10:01:59.2765646 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Avishek Siris 1 Jianbo Jiao 2 Gary Tam 0000-0001-7387-5180 3 Xianghua Xie 0000-0002-2701-8660 4 Rynson Lau 5 64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf 64287.VOR.pdf 2024-04-16T13:13:37.1189188 Output 4434052 application/pdf Version of Record true © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License. true eng http://creativecommons.org/licenses/by/4.0/
title	Inferring Attention Shifts for Salient Instance Ranking
spellingShingle	Inferring Attention Shifts for Salient Instance Ranking Avishek Siris Gary Tam Xianghua Xie Rynson Lau
title_short	Inferring Attention Shifts for Salient Instance Ranking
title_full	Inferring Attention Shifts for Salient Instance Ranking
title_fullStr	Inferring Attention Shifts for Salient Instance Ranking
title_full_unstemmed	Inferring Attention Shifts for Salient Instance Ranking
title_sort	Inferring Attention Shifts for Salient Instance Ranking
author_id_str_mv	896b738a2b485a166c052d94bca5fa68 e75a68e11a20e5f1da94ee6e28ff5e76 b334d40963c7a2f435f06d2c26c74e11 8d230434b6eadb1be5928241b0beecd0
author_id_fullname_str_mv	896b738a2b485a166c052d94bca5fa68_*_Avishek Siris e75a68e11a20e5f1da94ee6e28ff5e76__Gary Tam b334d40963c7a2f435f06d2c26c74e11__Xianghua Xie 8d230434b6eadb1be5928241b0beecd0_*_Rynson Lau
author	Avishek Siris Gary Tam Xianghua Xie Rynson Lau
author2	Avishek Siris Jianbo Jiao Gary Tam Xianghua Xie Rynson Lau
format	Journal article
container_title	International Journal of Computer Vision
container_volume	132
container_issue	3
container_start_page	964
publishDate	2024
institution	Swansea University
issn	0920-5691 1573-1405
doi_str_mv	10.1007/s11263-023-01906-7
publisher	Springer Science and Business Media LLC
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets.
published_date	2024-03-01T05:14:50Z
_version_	1851097025684176896
score	11.089386

Inferring Attention Shifts for Salient Instance Ranking

Similar Items