No Cover Image

E-Thesis 520 views 23 downloads

Delving into human visual attention for saliency detection of real-world images / AVISHEK SIRIS

Swansea University Author: AVISHEK SIRIS

DOI (Published version): 10.23889/SUthesis.60538

Abstract

Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that ha...

Full description

Published: Swansea 2022
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Tam, Gary K.L.
URI: https://cronfa.swan.ac.uk/Record/cronfa60538
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2022-07-19T16:16:07Z
last_indexed 2023-01-13T19:20:44Z
id cronfa60538
recordtype RisThesis
fullrecord <?xml version="1.0" encoding="utf-8"?><rfc1807 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><bib-version>v2</bib-version><id>60538</id><entry>2022-07-19</entry><title>Delving into human visual attention for saliency detection of real-world images</title><swanseaauthors><author><sid>a7e009a3eb6ac7b910d8789bc283b60e</sid><firstname>AVISHEK</firstname><surname>SIRIS</surname><name>AVISHEK SIRIS</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-07-19</date><abstract>Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Attention, Attention Shift, Saliency, Saliency Ranking, Salient Object Detection</keywords><publishedDay>15</publishedDay><publishedMonth>7</publishedMonth><publishedYear>2022</publishedYear><publishedDate>2022-07-15</publishedDate><doi>10.23889/SUthesis.60538</doi><url/><notes>ORCiD identifier: https://orcid.org/0000-0002-3064-2202</notes><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Tam, Gary K.L.</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><degreesponsorsfunders>Swansea Science DTC Postgraduate Research Scholarship</degreesponsorsfunders><apcterm/><funders/><projectreference/><lastEdited>2024-04-20T16:46:24.5416346</lastEdited><Created>2022-07-19T17:12:00.9698595</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>AVISHEK</firstname><surname>SIRIS</surname><order>1</order></author></authors><documents><document><filename>60538__24651__c12d67d6383a4d228ff8273b5f625a07.pdf</filename><originalFilename>Siris_Avishek_PhD_Thesis_Final_Embargoed_Redacted_Signature.pdf</originalFilename><uploaded>2022-07-19T17:28:55.6297149</uploaded><type>Output</type><contentLength>52406977</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><embargoDate>2023-07-19T00:00:00.0000000</embargoDate><documentNotes>Copyright: The author, Avishek Siris, 2022.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling v2 60538 2022-07-19 Delving into human visual attention for saliency detection of real-world images a7e009a3eb6ac7b910d8789bc283b60e AVISHEK SIRIS AVISHEK SIRIS true false 2022-07-19 Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets. E-Thesis Swansea Attention, Attention Shift, Saliency, Saliency Ranking, Salient Object Detection 15 7 2022 2022-07-15 10.23889/SUthesis.60538 ORCiD identifier: https://orcid.org/0000-0002-3064-2202 COLLEGE NANME COLLEGE CODE Swansea University Tam, Gary K.L. Doctoral Ph.D Swansea Science DTC Postgraduate Research Scholarship 2024-04-20T16:46:24.5416346 2022-07-19T17:12:00.9698595 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science AVISHEK SIRIS 1 60538__24651__c12d67d6383a4d228ff8273b5f625a07.pdf Siris_Avishek_PhD_Thesis_Final_Embargoed_Redacted_Signature.pdf 2022-07-19T17:28:55.6297149 Output 52406977 application/pdf E-Thesis – open access true 2023-07-19T00:00:00.0000000 Copyright: The author, Avishek Siris, 2022. true eng
title Delving into human visual attention for saliency detection of real-world images
spellingShingle Delving into human visual attention for saliency detection of real-world images
AVISHEK SIRIS
title_short Delving into human visual attention for saliency detection of real-world images
title_full Delving into human visual attention for saliency detection of real-world images
title_fullStr Delving into human visual attention for saliency detection of real-world images
title_full_unstemmed Delving into human visual attention for saliency detection of real-world images
title_sort Delving into human visual attention for saliency detection of real-world images
author_id_str_mv a7e009a3eb6ac7b910d8789bc283b60e
author_id_fullname_str_mv a7e009a3eb6ac7b910d8789bc283b60e_***_AVISHEK SIRIS
author AVISHEK SIRIS
author2 AVISHEK SIRIS
format E-Thesis
publishDate 2022
institution Swansea University
doi_str_mv 10.23889/SUthesis.60538
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description Saliency detection explores the problem of identifying regions or objects that stand out from its surroundings. It is one of the fundamental problems in computer vision, with its appli-cation widely used in other graphics, vision and robotics tasks. Relative saliency ranking is a new problem that has been introduced with the idea of determining ranking based on the differences in the saliency agreement between multiple observers. This approach can lead to multiple objects being given the same saliency ranks. However, psychology studies and behavioural observations show that humans shift their attention from one location to another when viewing an image. This is due to the fact that the human visual system have limited capacity in simultaneously processing multiple visual inputs. We consider the sequential shift-ing of attention on objects as a form of saliency ranking, thus, we propose a new problem of saliency ranking based on attention shift. Although there are methods proposed for predicting saliency ranks, they are not able to model this human attention shift well. They are primarily based on ranking saliency values from binary prediction, which does not properly facilitate saliency rank reasoning between multiple individual objects. In this thesis, we aim to explore deep learning techniques for learning to rank salient objects by inferring human attention shift. We first construct a large-scale salient object ranking dataset. We define the saliency rank of objects by the order that an observer attends to these objects based on attention shift. We then propose a deep learning model that is built from bottom-up and top-down attention mechanisms for performing saliency ranking. Our model is evaluated with both quantitative and qualitative experiments, in which our proposed approach achieves state-of-the-art performance.Regarding traditional salient object detection, we observe two main issues that lead to recent techniques failing in real-world complex image scenes. Firstly, most existing datasets consist of images with simple foregrounds and backgrounds, and limited number of objects that hardly represent real-life scenarios. Second, current methods only learn contextual features of salient objects with binary saliency labels. This is not very sufficient for a model to learn high-level semantics for saliency reasoning in complex scenes. We begin to address these problems by constructing a new large-scale dataset with complex scenes rich in context. We then propose a context-aware saliency network that learns to explicitly exploit the semantic scene contexts of an image. We perform extensive experiments to demonstrate that our proposed network outperforms state-of-the-arts. The evaluation also show the effectiveness of leveraging high-level scene semantics for saliency detection in complex scenarios, while also transferring well to other existing datasets.
published_date 2022-07-15T16:46:20Z
_version_ 1796869173254553600
score 11.037056