Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow

Nan, Fangzhe; Li, Frederick; Wang, Zhuoyue; Tam, Gary; Jiang, Zhaoyi; DongZheng,; Yang, Bailin

doi:10.1109/ICASSP49660.2025.10888353

Conference Paper/Proceeding/Abstract 762 views 387 downloads

Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow

Fangzhe Nan, Frederick Li, Zhuoyue Wang, Gary Tam

, Zhaoyi Jiang, DongZheng, Bailin Yang

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Pages: 1 - 5

Swansea University Author: Gary Tam

PDF | Accepted Manuscript

Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).
Download (1.25MB)

Check full text

DOI (Published version): 10.1109/ICASSP49660.2025.10888353

Abstract

Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion estimation. To our knowledge, most existing dynamic...

Full description

Published in:	ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ISBN:	979-8-3503-6875-8 979-8-3503-6874-1
ISSN:	1520-6149 2379-190X
Published:	IEEE 2025
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa68655

first_indexed	2025-01-30T16:02:06Z
last_indexed	2026-01-23T04:22:00Z
id	cronfa68655
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2026-01-21T16:24:49.8657913</datestamp><bib-version>v2</bib-version><id>68655</id><entry>2025-01-06</entry><title>Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow</title><swanseaauthors><author><sid>e75a68e11a20e5f1da94ee6e28ff5e76</sid><ORCID>0000-0001-7387-5180</ORCID><firstname>Gary</firstname><surname>Tam</surname><name>Gary Tam</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-01-06</date><deptcode>MACS</deptcode><abstract>Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion estimation. To our knowledge, most existing dynamic point cloud compression methods have largely overlooked this critical issue. Moreover, these methods typically employ a multi-scale single-pass approach for motion estimation, performing only one estimation at each scale. This limits accuracy and adversely impacts compression performance. To address these challenges, we propose a dynamic point cloud compression model called M2BR-DPCC (Multi-Modal Multi-Scale Bidirectional Recursion for Dynamic Point Cloud Compression). Our method introduces two key innovations. First, we integrate both point cloud and image data as inputs, leveraging a multi-modal feature representation completion (MFRepC) approach to align information across modalities. This addresses the issue of missing data in point clouds by using complementary information from images. Second, we implement a multi-scale bidirectional recursive (MSBR) motion estimation method. This module iteratively refines motion flows in both forward and backward directions, progressively enhancing point cloud features and improving motion estimation accuracy. Experimental results on widely used datasets, including MVUB and 8iVFB, demonstrate the effectiveness of our approach. Compared to existing methods, M2BR-DPCC achieves superior performance, with an average BD-rate improvement of 95.23% over V-PCC, 12.92% over D-DPCC, and 16.16% over patchDPCC. These results underscore the potential of leveraging multi-modal data and bidirectional refinement for dynamic point cloud compression.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</journal><volume/><journalNumber/><paginationStart>1</paginationStart><paginationEnd>5</paginationEnd><publisher>IEEE</publisher><placeOfPublication/><isbnPrint>979-8-3503-6875-8</isbnPrint><isbnElectronic>979-8-3503-6874-1</isbnElectronic><issnPrint>1520-6149</issnPrint><issnElectronic>2379-190X</issnElectronic><keywords>Point cloud compression; Technological innovation; Image coding; Accuracy; Limiting; Motion estimation; Dynamics; Estimation; Filling; Speech processing</keywords><publishedDay>7</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-03-07</publishedDate><doi>10.1109/ICASSP49660.2025.10888353</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>National Natural Science Foundation Grant 62172366, Zhe jiang Province Natural Science Foundation No. LY21F020013, LY22F020013; Royal Society grant IEC/NSFC/211159</funders><projectreference/><lastEdited>2026-01-21T16:24:49.8657913</lastEdited><Created>2025-01-06T11:08:51.4583579</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Fangzhe</firstname><surname>Nan</surname><order>1</order></author><author><firstname>Frederick</firstname><surname>Li</surname><order>2</order></author><author><firstname>Zhuoyue</firstname><surname>Wang</surname><order>3</order></author><author><firstname>Gary</firstname><surname>Tam</surname><orcid>0000-0001-7387-5180</orcid><order>4</order></author><author><firstname>Zhaoyi</firstname><surname>Jiang</surname><order>5</order></author><author><firstname/><surname>DongZheng</surname><order>6</order></author><author><firstname>Bailin</firstname><surname>Yang</surname><order>7</order></author></authors><documents><document><filename>68655__33249__a01ee00527a4417e9b82069afca7649b.pdf</filename><originalFilename>ICASSP2025_paper.pdf</originalFilename><uploaded>2025-01-06T11:14:18.6123169</uploaded><type>Output</type><contentLength>1315769</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><documentNotes>Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention).</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/deed.en</licence></document></documents><OutputDurs/></rfc1807>
spelling	2026-01-21T16:24:49.8657913 v2 68655 2025-01-06 Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow e75a68e11a20e5f1da94ee6e28ff5e76 0000-0001-7387-5180 Gary Tam Gary Tam true false 2025-01-06 MACS Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion estimation. To our knowledge, most existing dynamic point cloud compression methods have largely overlooked this critical issue. Moreover, these methods typically employ a multi-scale single-pass approach for motion estimation, performing only one estimation at each scale. This limits accuracy and adversely impacts compression performance. To address these challenges, we propose a dynamic point cloud compression model called M2BR-DPCC (Multi-Modal Multi-Scale Bidirectional Recursion for Dynamic Point Cloud Compression). Our method introduces two key innovations. First, we integrate both point cloud and image data as inputs, leveraging a multi-modal feature representation completion (MFRepC) approach to align information across modalities. This addresses the issue of missing data in point clouds by using complementary information from images. Second, we implement a multi-scale bidirectional recursive (MSBR) motion estimation method. This module iteratively refines motion flows in both forward and backward directions, progressively enhancing point cloud features and improving motion estimation accuracy. Experimental results on widely used datasets, including MVUB and 8iVFB, demonstrate the effectiveness of our approach. Compared to existing methods, M2BR-DPCC achieves superior performance, with an average BD-rate improvement of 95.23% over V-PCC, 12.92% over D-DPCC, and 16.16% over patchDPCC. These results underscore the potential of leveraging multi-modal data and bidirectional refinement for dynamic point cloud compression. Conference Paper/Proceeding/Abstract ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1 5 IEEE 979-8-3503-6875-8 979-8-3503-6874-1 1520-6149 2379-190X Point cloud compression; Technological innovation; Image coding; Accuracy; Limiting; Motion estimation; Dynamics; Estimation; Filling; Speech processing 7 3 2025 2025-03-07 10.1109/ICASSP49660.2025.10888353 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required National Natural Science Foundation Grant 62172366, Zhe jiang Province Natural Science Foundation No. LY21F020013, LY22F020013; Royal Society grant IEC/NSFC/211159 2026-01-21T16:24:49.8657913 2025-01-06T11:08:51.4583579 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Fangzhe Nan 1 Frederick Li 2 Zhuoyue Wang 3 Gary Tam 0000-0001-7387-5180 4 Zhaoyi Jiang 5 DongZheng 6 Bailin Yang 7 68655__33249__a01ee00527a4417e9b82069afca7649b.pdf ICASSP2025_paper.pdf 2025-01-06T11:14:18.6123169 Output 1315769 application/pdf Accepted Manuscript true Author accepted manuscript document released under the terms of a Creative Commons CC-BY licence using the Swansea University Research Publications Policy (rights retention). true eng https://creativecommons.org/licenses/by/4.0/deed.en
title	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
spellingShingle	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow Gary Tam
title_short	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
title_full	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
title_fullStr	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
title_full_unstemmed	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
title_sort	Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow
author_id_str_mv	e75a68e11a20e5f1da94ee6e28ff5e76
author_id_fullname_str_mv	e75a68e11a20e5f1da94ee6e28ff5e76_***_Gary Tam
author	Gary Tam
author2	Fangzhe Nan Frederick Li Zhuoyue Wang Gary Tam Zhaoyi Jiang DongZheng Bailin Yang
format	Conference Paper/Proceeding/Abstract
container_title	ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
container_start_page	1
publishDate	2025
institution	Swansea University
isbn	979-8-3503-6875-8 979-8-3503-6874-1
issn	1520-6149 2379-190X
doi_str_mv	10.1109/ICASSP49660.2025.10888353
publisher	IEEE
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion estimation. To our knowledge, most existing dynamic point cloud compression methods have largely overlooked this critical issue. Moreover, these methods typically employ a multi-scale single-pass approach for motion estimation, performing only one estimation at each scale. This limits accuracy and adversely impacts compression performance. To address these challenges, we propose a dynamic point cloud compression model called M2BR-DPCC (Multi-Modal Multi-Scale Bidirectional Recursion for Dynamic Point Cloud Compression). Our method introduces two key innovations. First, we integrate both point cloud and image data as inputs, leveraging a multi-modal feature representation completion (MFRepC) approach to align information across modalities. This addresses the issue of missing data in point clouds by using complementary information from images. Second, we implement a multi-scale bidirectional recursive (MSBR) motion estimation method. This module iteratively refines motion flows in both forward and backward directions, progressively enhancing point cloud features and improving motion estimation accuracy. Experimental results on widely used datasets, including MVUB and 8iVFB, demonstrate the effectiveness of our approach. Compared to existing methods, M2BR-DPCC achieves superior performance, with an average BD-rate improvement of 95.23% over V-PCC, 12.92% over D-DPCC, and 16.16% over patchDPCC. These results underscore the potential of leveraging multi-modal data and bidirectional refinement for dynamic point cloud compression.
published_date	2025-03-07T05:25:10Z
_version_	1858707820746113024
score	11.453587

Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow

Similar Items