No Cover Image

Journal article 154 views 41 downloads

From attributes to natural language: A survey and foresight on text-based person re-identification

FANZHI JIANG, Scott Yang Orcid Logo, Mark Jones Orcid Logo, Liumei Zhang Orcid Logo

Information Fusion, Volume: 118, Start page: 102879

Swansea University Authors: FANZHI JIANG, Scott Yang Orcid Logo, Mark Jones Orcid Logo

  • Text-based person re-identification_PrePrint Accepted Version.pdf

    PDF | Accepted Manuscript

    Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).

    Download (1.96MB)

Abstract

Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, vid...

Full description

Published in: Information Fusion
ISSN: 1566-2535 1872-6305
Published: Elsevier BV 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa68609
first_indexed 2025-01-09T20:33:57Z
last_indexed 2025-02-19T07:28:21Z
id cronfa68609
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2025-02-18T10:38:29.9681605</datestamp><bib-version>v2</bib-version><id>68609</id><entry>2024-12-20</entry><title>From attributes to natural language: A survey and foresight on text-based person re-identification</title><swanseaauthors><author><sid>d3dcbe2b549acd06da61c3f2d52847d7</sid><firstname>FANZHI</firstname><surname>JIANG</surname><name>FANZHI JIANG</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>2e1030b6e14fc9debd5d5ae7cc335562</sid><ORCID>0000-0001-8991-1190</ORCID><firstname>Mark</firstname><surname>Jones</surname><name>Mark Jones</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-12-20</date><abstract>Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).</abstract><type>Journal Article</type><journal>Information Fusion</journal><volume>118</volume><journalNumber/><paginationStart>102879</paginationStart><paginationEnd/><publisher>Elsevier BV</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>1566-2535</issnPrint><issnElectronic>1872-6305</issnElectronic><keywords>Person re-identification; Text; Natural language; Attributes; Diffusion model</keywords><publishedDay>1</publishedDay><publishedMonth>6</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-06-01</publishedDate><doi>10.1016/j.inffus.2024.102879</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>This document is the results of the research project funded by The Engineering and Physical Sciences Research Council of UK Research and Innovation (UKRI)</funders><projectreference/><lastEdited>2025-02-18T10:38:29.9681605</lastEdited><Created>2024-12-20T09:47:59.1951735</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>FANZHI</firstname><surname>JIANG</surname><order>1</order></author><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>2</order></author><author><firstname>Mark</firstname><surname>Jones</surname><orcid>0000-0001-8991-1190</orcid><order>3</order></author><author><firstname>Liumei</firstname><surname>Zhang</surname><orcid>0000-0002-1834-5424</orcid><order>4</order></author></authors><documents><document><filename>68609__33429__d006d633bb4e4144952e2fbcbd2302d4.pdf</filename><originalFilename>Text-based person re-identification_PrePrint Accepted Version.pdf</originalFilename><uploaded>2025-01-28T16:44:23.7028240</uploaded><type>Output</type><contentLength>2059077</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2025-02-18T10:38:29.9681605 v2 68609 2024-12-20 From attributes to natural language: A survey and foresight on text-based person re-identification d3dcbe2b549acd06da61c3f2d52847d7 FANZHI JIANG FANZHI JIANG true false 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2e1030b6e14fc9debd5d5ae7cc335562 0000-0001-8991-1190 Mark Jones Mark Jones true false 2024-12-20 Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR). Journal Article Information Fusion 118 102879 Elsevier BV 1566-2535 1872-6305 Person re-identification; Text; Natural language; Attributes; Diffusion model 1 6 2025 2025-06-01 10.1016/j.inffus.2024.102879 COLLEGE NANME COLLEGE CODE Swansea University Not Required This document is the results of the research project funded by The Engineering and Physical Sciences Research Council of UK Research and Innovation (UKRI) 2025-02-18T10:38:29.9681605 2024-12-20T09:47:59.1951735 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science FANZHI JIANG 1 Scott Yang 0000-0002-6618-7483 2 Mark Jones 0000-0001-8991-1190 3 Liumei Zhang 0000-0002-1834-5424 4 68609__33429__d006d633bb4e4144952e2fbcbd2302d4.pdf Text-based person re-identification_PrePrint Accepted Version.pdf 2025-01-28T16:44:23.7028240 Output 2059077 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/
title From attributes to natural language: A survey and foresight on text-based person re-identification
spellingShingle From attributes to natural language: A survey and foresight on text-based person re-identification
FANZHI JIANG
Scott Yang
Mark Jones
title_short From attributes to natural language: A survey and foresight on text-based person re-identification
title_full From attributes to natural language: A survey and foresight on text-based person re-identification
title_fullStr From attributes to natural language: A survey and foresight on text-based person re-identification
title_full_unstemmed From attributes to natural language: A survey and foresight on text-based person re-identification
title_sort From attributes to natural language: A survey and foresight on text-based person re-identification
author_id_str_mv d3dcbe2b549acd06da61c3f2d52847d7
81dc663ca0e68c60908d35b1d2ec3a9b
2e1030b6e14fc9debd5d5ae7cc335562
author_id_fullname_str_mv d3dcbe2b549acd06da61c3f2d52847d7_***_FANZHI JIANG
81dc663ca0e68c60908d35b1d2ec3a9b_***_Scott Yang
2e1030b6e14fc9debd5d5ae7cc335562_***_Mark Jones
author FANZHI JIANG
Scott Yang
Mark Jones
author2 FANZHI JIANG
Scott Yang
Mark Jones
Liumei Zhang
format Journal article
container_title Information Fusion
container_volume 118
container_start_page 102879
publishDate 2025
institution Swansea University
issn 1566-2535
1872-6305
doi_str_mv 10.1016/j.inffus.2024.102879
publisher Elsevier BV
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).
published_date 2025-06-01T08:16:28Z
_version_ 1827915629421658112
score 11.055693