From attributes to natural language: A survey and foresight on text-based person re-identification

JIANG, FANZHI; Yang, Scott; Jones, Mark; Zhang, Liumei

doi:10.1016/j.inffus.2024.102879

Journal article 766 views 1167 downloads

From attributes to natural language: A survey and foresight on text-based person re-identification

FANZHI JIANG, Scott Yang

, Mark Jones

, Liumei Zhang

Information Fusion, Volume: 118, Start page: 102879

Swansea University Authors: FANZHI JIANG, Scott Yang , Mark Jones

PDF | Accepted Manuscript

Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (1.96MB)

Check full text

DOI (Published version): 10.1016/j.inffus.2024.102879

Abstract

Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, vid...

Full description

Published in:	Information Fusion
ISSN:	1566-2535 1872-6305
Published:	Elsevier BV 2025
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa68609

first_indexed	2025-01-09T20:33:57Z
last_indexed	2025-02-19T07:28:21Z
id	cronfa68609
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2025-02-18T10:38:29.9681605</datestamp><bib-version>v2</bib-version><id>68609</id><entry>2024-12-20</entry><title>From attributes to natural language: A survey and foresight on text-based person re-identification</title><swanseaauthors><author><sid>d3dcbe2b549acd06da61c3f2d52847d7</sid><firstname>FANZHI</firstname><surname>JIANG</surname><name>FANZHI JIANG</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>2e1030b6e14fc9debd5d5ae7cc335562</sid><ORCID>0000-0001-8991-1190</ORCID><firstname>Mark</firstname><surname>Jones</surname><name>Mark Jones</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2024-12-20</date><abstract>Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).</abstract><type>Journal Article</type><journal>Information Fusion</journal><volume>118</volume><journalNumber/><paginationStart>102879</paginationStart><paginationEnd/><publisher>Elsevier BV</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>1566-2535</issnPrint><issnElectronic>1872-6305</issnElectronic><keywords>Person re-identification; Text; Natural language; Attributes; Diffusion model</keywords><publishedDay>1</publishedDay><publishedMonth>6</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-06-01</publishedDate><doi>10.1016/j.inffus.2024.102879</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>This document is the results of the research project funded by The Engineering and Physical Sciences Research Council of UK Research and Innovation (UKRI)</funders><projectreference/><lastEdited>2025-02-18T10:38:29.9681605</lastEdited><Created>2024-12-20T09:47:59.1951735</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>FANZHI</firstname><surname>JIANG</surname><order>1</order></author><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>2</order></author><author><firstname>Mark</firstname><surname>Jones</surname><orcid>0000-0001-8991-1190</orcid><order>3</order></author><author><firstname>Liumei</firstname><surname>Zhang</surname><orcid>0000-0002-1834-5424</orcid><order>4</order></author></authors><documents><document><filename>68609__33429__d006d633bb4e4144952e2fbcbd2302d4.pdf</filename><originalFilename>Text-based person re-identification_PrePrint Accepted Version.pdf</originalFilename><uploaded>2025-01-28T16:44:23.7028240</uploaded><type>Output</type><contentLength>2059077</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling	2025-02-18T10:38:29.9681605 v2 68609 2024-12-20 From attributes to natural language: A survey and foresight on text-based person re-identification d3dcbe2b549acd06da61c3f2d52847d7 FANZHI JIANG FANZHI JIANG true false 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2e1030b6e14fc9debd5d5ae7cc335562 0000-0001-8991-1190 Mark Jones Mark Jones true false 2024-12-20 Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR). Journal Article Information Fusion 118 102879 Elsevier BV 1566-2535 1872-6305 Person re-identification; Text; Natural language; Attributes; Diffusion model 1 6 2025 2025-06-01 10.1016/j.inffus.2024.102879 COLLEGE NANME COLLEGE CODE Swansea University Not Required This document is the results of the research project funded by The Engineering and Physical Sciences Research Council of UK Research and Innovation (UKRI) 2025-02-18T10:38:29.9681605 2024-12-20T09:47:59.1951735 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science FANZHI JIANG 1 Scott Yang 0000-0002-6618-7483 2 Mark Jones 0000-0001-8991-1190 3 Liumei Zhang 0000-0002-1834-5424 4 68609__33429__d006d633bb4e4144952e2fbcbd2302d4.pdf Text-based person re-identification_PrePrint Accepted Version.pdf 2025-01-28T16:44:23.7028240 Output 2059077 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/
title	From attributes to natural language: A survey and foresight on text-based person re-identification
spellingShingle	From attributes to natural language: A survey and foresight on text-based person re-identification FANZHI JIANG Scott Yang Mark Jones
title_short	From attributes to natural language: A survey and foresight on text-based person re-identification
title_full	From attributes to natural language: A survey and foresight on text-based person re-identification
title_fullStr	From attributes to natural language: A survey and foresight on text-based person re-identification
title_full_unstemmed	From attributes to natural language: A survey and foresight on text-based person re-identification
title_sort	From attributes to natural language: A survey and foresight on text-based person re-identification
author_id_str_mv	d3dcbe2b549acd06da61c3f2d52847d7 81dc663ca0e68c60908d35b1d2ec3a9b 2e1030b6e14fc9debd5d5ae7cc335562
author_id_fullname_str_mv	d3dcbe2b549acd06da61c3f2d52847d7_*_FANZHI JIANG 81dc663ca0e68c60908d35b1d2ec3a9b__Scott Yang 2e1030b6e14fc9debd5d5ae7cc335562_**_Mark Jones
author	FANZHI JIANG Scott Yang Mark Jones
author2	FANZHI JIANG Scott Yang Mark Jones Liumei Zhang
format	Journal article
container_title	Information Fusion
container_volume	118
container_start_page	102879
publishDate	2025
institution	Swansea University
issn	1566-2535 1872-6305
doi_str_mv	10.1016/j.inffus.2024.102879
publisher	Elsevier BV
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodalanalysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation guided re-identification (TBPGR).
published_date	2025-06-01T05:22:25Z
_version_	1851550487654957056
score	11.090091

From attributes to natural language: A survey and foresight on text-based person re-identification

Similar Items