Delving into human visual attention for saliency detection of real-world images

SIRIS, AVISHEK

doi:10.23889/SUthesis.60538

E-Thesis 599 views 38 downloads

Delving into human visual attention for saliency detection of real-world images / AVISHEK SIRIS

Swansea University Author: AVISHEK SIRIS

PDF | E-Thesis – open access

Copyright: The author, Avishek Siris, 2022.
Download (49.98MB)

DOI (Published version): 10.23889/SUthesis.60538

Abstract

Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that ha...

Full description

Published:	Swansea 2022
Institution:	Swansea University
Degree level:	Doctoral
Degree name:	Ph.D
Supervisor:	Tam, Gary K.L.
URI:	https://cronfa.swan.ac.uk/Record/cronfa60538

first_indexed	2022-07-19T16:16:07Z
last_indexed	2024-11-14T12:17:31Z
id	cronfa60538
recordtype	RisThesis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2024-04-20T16:46:24.5416346</datestamp><bib-version>v2</bib-version><id>60538</id><entry>2022-07-19</entry><title>Delving into human visual attention for saliency detection of real-world images</title><swanseaauthors><author><sid>a7e009a3eb6ac7b910d8789bc283b60e</sid><firstname>AVISHEK</firstname><surname>SIRIS</surname><name>AVISHEK SIRIS</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-07-19</date><abstract>Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Attention, Attention Shift, Saliency, Saliency Ranking, Salient Object Detection</keywords><publishedDay>15</publishedDay><publishedMonth>7</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-07-15</publishedDate><doi>10.23889/SUthesis.60538</doi><url/><notes>ORCiD identifier: https://orcid.org/0000-0002-3064-2202</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Tam, Gary K.L.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>Swansea Science DTC Postgraduate Research Scholarship</degreesponsorsfunders><apcterm/><funders/><projectreference/><lastEdited>2024-04-20T16:46:24.5416346</lastEdited><Created>2022-07-19T17:12:00.9698595</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>AVISHEK</firstname><surname>SIRIS</surname><order>1</order></author></authors><documents><document><filename>60538__24651__c12d67d6383a4d228ff8273b5f625a07.pdf</filename><originalFilename>Siris_Avishek_PhD_Thesis_Final_Embargoed_Redacted_Signature.pdf</originalFilename><uploaded>2022-07-19T17:28:55.6297149</uploaded><type>Output</type><contentLength>52406977</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><embargoDate>2023-07-19T00:00:00.0000000</embargoDate><documentNotes>Copyright: The author, Avishek Siris, 2022.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2024-04-20T16:46:24.5416346 v2 60538 2022-07-19 Delving into human visual attention for saliency detection of real-world images a7e009a3eb6ac7b910d8789bc283b60e AVISHEK SIRIS AVISHEK SIRIS true false 2022-07-19 Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets. E-Thesis Swansea Attention, Attention Shift, Saliency, Saliency Ranking, Salient Object Detection 15 7 2022 2022-07-15 10.23889/SUthesis.60538 ORCiD identifier: https://orcid.org/0000-0002-3064-2202 COLLEGE NANME COLLEGE CODE Swansea University Tam, Gary K.L. Doctoral Ph.D Swansea Science DTC Postgraduate Research Scholarship 2024-04-20T16:46:24.5416346 2022-07-19T17:12:00.9698595 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science AVISHEK SIRIS 1 60538__24651__c12d67d6383a4d228ff8273b5f625a07.pdf Siris_Avishek_PhD_Thesis_Final_Embargoed_Redacted_Signature.pdf 2022-07-19T17:28:55.6297149 Output 52406977 application/pdf E-Thesis – open access true 2023-07-19T00:00:00.0000000 Copyright: The author, Avishek Siris, 2022. true eng
title	Delving into human visual attention for saliency detection of real-world images
spellingShingle	Delving into human visual attention for saliency detection of real-world images AVISHEK SIRIS
title_short	Delving into human visual attention for saliency detection of real-world images
title_full	Delving into human visual attention for saliency detection of real-world images
title_fullStr	Delving into human visual attention for saliency detection of real-world images
title_full_unstemmed	Delving into human visual attention for saliency detection of real-world images
title_sort	Delving into human visual attention for saliency detection of real-world images
author_id_str_mv	a7e009a3eb6ac7b910d8789bc283b60e
author_id_fullname_str_mv	a7e009a3eb6ac7b910d8789bc283b60e_***_AVISHEK SIRIS
author	AVISHEK SIRIS
author2	AVISHEK SIRIS
format	E-Thesis
publishDate	2022
institution	Swansea University
doi_str_mv	10.23889/SUthesis.60538
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets.
published_date	2022-07-15T09:11:20Z
_version_	1829002470270959616
score	11.058203

Delving into human visual attention for saliency detection of real-world images / AVISHEK SIRIS

Similar Items