No Cover Image

Journal article 331 views 21 downloads

Inferring Attention Shifts for Salient Instance Ranking

Avishek Siris, Jianbo Jiao, Gary Tam Orcid Logo, Xianghua Xie Orcid Logo, Rynson Lau

International Journal of Computer Vision, Volume: 132, Issue: 3, Pages: 964 - 986

Swansea University Authors: Avishek Siris, Gary Tam Orcid Logo, Xianghua Xie Orcid Logo, Rynson Lau

  • 64287.VOR.pdf

    PDF | Version of Record

    © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.

    Download (4.23MB)

Abstract

The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise a...

Full description

Published in: International Journal of Computer Vision
ISSN: 0920-5691 1573-1405
Published: Springer Science and Business Media LLC 2024
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa64287
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2023-09-01T09:11:41Z
last_indexed 2023-09-01T09:11:41Z
id cronfa64287
recordtype SURis
fullrecord <?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>64287</id><entry>2023-09-01</entry><title>Inferring Attention Shifts for Salient Instance Ranking</title><swanseaauthors><author><sid>896b738a2b485a166c052d94bca5fa68</sid><firstname>Avishek</firstname><surname>Siris</surname><name>Avishek Siris</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>e75a68e11a20e5f1da94ee6e28ff5e76</sid><ORCID>0000-0001-7387-5180</ORCID><firstname>Gary</firstname><surname>Tam</surname><name>Gary Tam</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>b334d40963c7a2f435f06d2c26c74e11</sid><ORCID>0000-0002-2701-8660</ORCID><firstname>Xianghua</firstname><surname>Xie</surname><name>Xianghua Xie</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>8d230434b6eadb1be5928241b0beecd0</sid><firstname>Rynson</firstname><surname>Lau</surname><name>Rynson Lau</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-09-01</date><deptcode>SCS</deptcode><abstract>The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets.</abstract><type>Journal Article</type><journal>International Journal of Computer Vision</journal><volume>132</volume><journalNumber>3</journalNumber><paginationStart>964</paginationStart><paginationEnd>986</paginationEnd><publisher>Springer Science and Business Media LLC</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0920-5691</issnPrint><issnElectronic>1573-1405</issnElectronic><keywords>Attention Shift, Saliency, Saliency Ranking, Salient Object Detection.</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2024</publishedYear><publishedDate>2024-03-01</publishedDate><doi>10.1007/s11263-023-01906-7</doi><url/><notes/><college>COLLEGE NANME</college><department>Computer Science</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>SCS</DepartmentCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674).</funders><projectreference/><lastEdited>2024-04-16T13:16:06.3412046</lastEdited><Created>2023-09-01T10:01:59.2765646</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Avishek</firstname><surname>Siris</surname><order>1</order></author><author><firstname>Jianbo</firstname><surname>Jiao</surname><order>2</order></author><author><firstname>Gary</firstname><surname>Tam</surname><orcid>0000-0001-7387-5180</orcid><order>3</order></author><author><firstname>Xianghua</firstname><surname>Xie</surname><orcid>0000-0002-2701-8660</orcid><order>4</order></author><author><firstname>Rynson</firstname><surname>Lau</surname><order>5</order></author></authors><documents><document><filename>64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf</filename><originalFilename>64287.VOR.pdf</originalFilename><uploaded>2024-04-16T13:13:37.1189188</uploaded><type>Output</type><contentLength>4434052</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>© The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>http://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling v2 64287 2023-09-01 Inferring Attention Shifts for Salient Instance Ranking 896b738a2b485a166c052d94bca5fa68 Avishek Siris Avishek Siris true false e75a68e11a20e5f1da94ee6e28ff5e76 0000-0001-7387-5180 Gary Tam Gary Tam true false b334d40963c7a2f435f06d2c26c74e11 0000-0002-2701-8660 Xianghua Xie Xianghua Xie true false 8d230434b6eadb1be5928241b0beecd0 Rynson Lau Rynson Lau true false 2023-09-01 SCS The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets. Journal Article International Journal of Computer Vision 132 3 964 986 Springer Science and Business Media LLC 0920-5691 1573-1405 Attention Shift, Saliency, Saliency Ranking, Salient Object Detection. 1 3 2024 2024-03-01 10.1007/s11263-023-01906-7 COLLEGE NANME Computer Science COLLEGE CODE SCS Swansea University SU Library paid the OA fee (TA Institutional Deal) This work was funded by a Swansea University Doctoral Training Postgraduate Research Scholarship 0301[164]. For the purpose of Open Access the author has applied a CC BY copyright licence to any Author Accepted Manuscript version arising from this submission. Jianbo Jiao is supported by the EPSRC Programme Grant Visual AI EP/T028572/1. Gary Tam is supported by the Royal Society grant IEC/NSFC/211159. This work was supported by the Research Grants Council of Hong Kong (Grant No.: 11205620), and a Strategic Research Grant from City University of Hong Kong (Ref.: 7005674). 2024-04-16T13:16:06.3412046 2023-09-01T10:01:59.2765646 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Avishek Siris 1 Jianbo Jiao 2 Gary Tam 0000-0001-7387-5180 3 Xianghua Xie 0000-0002-2701-8660 4 Rynson Lau 5 64287__30034__a515b9124a2c47c1b4370771d3d23bc6.pdf 64287.VOR.pdf 2024-04-16T13:13:37.1189188 Output 4434052 application/pdf Version of Record true © The Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License. true eng http://creativecommons.org/licenses/by/4.0/
title Inferring Attention Shifts for Salient Instance Ranking
spellingShingle Inferring Attention Shifts for Salient Instance Ranking
Avishek Siris
Gary Tam
Xianghua Xie
Rynson Lau
title_short Inferring Attention Shifts for Salient Instance Ranking
title_full Inferring Attention Shifts for Salient Instance Ranking
title_fullStr Inferring Attention Shifts for Salient Instance Ranking
title_full_unstemmed Inferring Attention Shifts for Salient Instance Ranking
title_sort Inferring Attention Shifts for Salient Instance Ranking
author_id_str_mv 896b738a2b485a166c052d94bca5fa68
e75a68e11a20e5f1da94ee6e28ff5e76
b334d40963c7a2f435f06d2c26c74e11
8d230434b6eadb1be5928241b0beecd0
author_id_fullname_str_mv 896b738a2b485a166c052d94bca5fa68_***_Avishek Siris
e75a68e11a20e5f1da94ee6e28ff5e76_***_Gary Tam
b334d40963c7a2f435f06d2c26c74e11_***_Xianghua Xie
8d230434b6eadb1be5928241b0beecd0_***_Rynson Lau
author Avishek Siris
Gary Tam
Xianghua Xie
Rynson Lau
author2 Avishek Siris
Jianbo Jiao
Gary Tam
Xianghua Xie
Rynson Lau
format Journal article
container_title International Journal of Computer Vision
container_volume 132
container_issue 3
container_start_page 964
publishDate 2024
institution Swansea University
issn 0920-5691
1573-1405
doi_str_mv 10.1007/s11263-023-01906-7
publisher Springer Science and Business Media LLC
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description The human visual system has limited capacity in simultaneously processing multiple visual inputs. Consequently, humans rely on shifting their attention from one location to another. When viewing an image of complex scenes, psychology studies and behavioural observations show that humans prioritise and sequentially shift attention among multiple visual stimuli. In this paper, we propose to predict the saliency rank of multiple objects by inferring human attention shift. We first construct a new large-scale salient object ranking dataset, with the saliency rank of objects defined by the order that an observer attends to these objects via attention shift. We then propose a new deep learning based model to leverage both bottom-up and top-down attention mechanisms for saliency rank prediction. Our model includes three novel modules: Spatial Mask Module (SMM), Selective Attention Module (SAM) and Salient Instance Edge Module (SIEM). SMM integrates bottom-up and semantic object properties to enhance contextual object features, from which SAM learns the dependencies between object features and image features for saliency reasoning. SIEM is designed to improve segmentation of salient objects, which helps further improve their rank predictions. Experimental results show that our proposed network achieves state-of-the-art performances on the salient object ranking task across multiple datasets.
published_date 2024-03-01T13:16:03Z
_version_ 1796493555129122816
score 11.013799