General hardware multicasting for fine-grained message-passing architectures

Naylor, Matthew; Moore, Simon W.; Thomas, David; Beaumont, Jonathan R.; Fleming, Shane; Vousden, Mark; Markettos, A. Theodore; Bytheway, Thomas; Brown, Andrew

doi:10.1109/pdp52278.2021.00028

Conference Paper/Proceeding/Abstract 1413 views 467 downloads

General hardware multicasting for fine-grained message-passing architectures

Matthew Naylor, Simon W. Moore, David Thomas, Jonathan R. Beaumont, Shane Fleming, Mark Vousden, A. Theodore Markettos, Thomas Bytheway, Andrew Brown

2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pages: 126 - 133

Swansea University Author: Shane Fleming

PDF | Accepted Manuscript
Download (229.88KB)

Check full text

DOI (Published version): 10.1109/pdp52278.2021.00028

Abstract

Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patter...

Full description

Published in:	2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
ISBN:	978-1-6654-4764-5 978-1-6654-1455-5
ISSN:	1066-6192 2377-5750
Published:	IEEE 2021
Online Access:	Check full text
URI:	https://cronfa.swan.ac.uk/Record/cronfa56452

first_indexed	2021-03-16T10:02:35Z
last_indexed	2025-10-14T07:00:38Z
id	cronfa56452
recordtype	SURis
fullrecord	<?xml version="1.0"?><rfc1807><datestamp>2025-10-13T16:50:04.1954130</datestamp><bib-version>v2</bib-version><id>56452</id><entry>2021-03-16</entry><title>General hardware multicasting for fine-grained message-passing architectures</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-03-16</date><deptcode>MACS</deptcode><abstract>Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)</journal><volume/><journalNumber/><paginationStart>126</paginationStart><paginationEnd>133</paginationEnd><publisher>IEEE</publisher><placeOfPublication/><isbnPrint>978-1-6654-4764-5</isbnPrint><isbnElectronic>978-1-6654-1455-5</isbnElectronic><issnPrint>1066-6192</issnPrint><issnElectronic>2377-5750</issnElectronic><keywords>Scalability, Computer architecture, Multicast communication, System recovery, Hardware, Software, Topology</keywords><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-03-01</publishedDate><doi>10.1109/pdp52278.2021.00028</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm>Not Required</apcterm><funders>This work was supported by UK EPSRC grant EP/N031768/1 (POETS project).</funders><projectreference/><lastEdited>2025-10-13T16:50:04.1954130</lastEdited><Created>2021-03-16T09:57:33.4741858</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Matthew</firstname><surname>Naylor</surname><order>1</order></author><author><firstname>Simon W.</firstname><surname>Moore</surname><order>2</order></author><author><firstname>David</firstname><surname>Thomas</surname><order>3</order></author><author><firstname>Jonathan R.</firstname><surname>Beaumont</surname><order>4</order></author><author><firstname>Shane</firstname><surname>Fleming</surname><order>5</order></author><author><firstname>Mark</firstname><surname>Vousden</surname><order>6</order></author><author><firstname>A. Theodore</firstname><surname>Markettos</surname><order>7</order></author><author><firstname>Thomas</firstname><surname>Bytheway</surname><order>8</order></author><author><firstname>Andrew</firstname><surname>Brown</surname><order>9</order></author></authors><documents><document><filename>56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf</filename><originalFilename>pdp2021-mcast-draft.pdf</originalFilename><uploaded>2021-03-16T10:01:48.8252523</uploaded><type>Output</type><contentLength>235400</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling	2025-10-13T16:50:04.1954130 v2 56452 2021-03-16 General hardware multicasting for fine-grained message-passing architectures fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-03-16 MACS Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. Conference Paper/Proceeding/Abstract 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 126 133 IEEE 978-1-6654-4764-5 978-1-6654-1455-5 1066-6192 2377-5750 Scalability, Computer architecture, Multicast communication, System recovery, Hardware, Software, Topology 1 3 2021 2021-03-01 10.1109/pdp52278.2021.00028 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University Not Required This work was supported by UK EPSRC grant EP/N031768/1 (POETS project). 2025-10-13T16:50:04.1954130 2021-03-16T09:57:33.4741858 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Matthew Naylor 1 Simon W. Moore 2 David Thomas 3 Jonathan R. Beaumont 4 Shane Fleming 5 Mark Vousden 6 A. Theodore Markettos 7 Thomas Bytheway 8 Andrew Brown 9 56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf pdp2021-mcast-draft.pdf 2021-03-16T10:01:48.8252523 Output 235400 application/pdf Accepted Manuscript true true eng
title	General hardware multicasting for fine-grained message-passing architectures
spellingShingle	General hardware multicasting for fine-grained message-passing architectures Shane Fleming
title_short	General hardware multicasting for fine-grained message-passing architectures
title_full	General hardware multicasting for fine-grained message-passing architectures
title_fullStr	General hardware multicasting for fine-grained message-passing architectures
title_full_unstemmed	General hardware multicasting for fine-grained message-passing architectures
title_sort	General hardware multicasting for fine-grained message-passing architectures
author_id_str_mv	fe23ad3ebacc194b4f4c480fdde55b95
author_id_fullname_str_mv	fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming
author	Shane Fleming
author2	Matthew Naylor Simon W. Moore David Thomas Jonathan R. Beaumont Shane Fleming Mark Vousden A. Theodore Markettos Thomas Bytheway Andrew Brown
format	Conference Paper/Proceeding/Abstract
container_title	2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
container_start_page	126
publishDate	2021
institution	Swansea University
isbn	978-1-6654-4764-5 978-1-6654-1455-5
issn	1066-6192 2377-5750
doi_str_mv	10.1109/pdp52278.2021.00028
publisher	IEEE
college_str	Faculty of Science and Engineering
hierarchytype
hierarchy_top_id	facultyofscienceandengineering
hierarchy_top_title	Faculty of Science and Engineering
hierarchy_parent_id	facultyofscienceandengineering
hierarchy_parent_title	Faculty of Science and Engineering
department_str	School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str	1
active_str	0
description	Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.
published_date	2021-03-01T05:24:07Z
_version_	1864049201061560320
score	11.103791

General hardware multicasting for fine-grained message-passing architectures

Similar Items