Conference Paper/Proceeding/Abstract 1017 views 230 downloads
General hardware multicasting for fine-grained message-passing architectures
2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Volume: 29th, Pages: 126 - 133
Swansea University Author: Shane Fleming
-
PDF | Accepted Manuscript
Download (229.88KB)
DOI (Published version): 10.1109/pdp52278.2021.00028
Abstract
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patter...
Published in: | 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) |
---|---|
ISBN: | 9781665414555 |
ISSN: | 1066-6192 2377-5750 |
Published: |
IEEE
2021
|
Online Access: |
Check full text
|
URI: | https://cronfa.swan.ac.uk/Record/cronfa56452 |
first_indexed |
2021-03-16T10:02:35Z |
---|---|
last_indexed |
2021-11-25T04:14:12Z |
id |
cronfa56452 |
recordtype |
SURis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2021-11-24T16:34:45.5606657</datestamp><bib-version>v2</bib-version><id>56452</id><entry>2021-03-16</entry><title>General hardware multicasting for fine-grained message-passing architectures</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-03-16</date><deptcode>MACS</deptcode><abstract>Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)</journal><volume>29th</volume><journalNumber/><paginationStart>126</paginationStart><paginationEnd>133</paginationEnd><publisher>IEEE</publisher><placeOfPublication/><isbnPrint/><isbnElectronic>9781665414555</isbnElectronic><issnPrint>1066-6192</issnPrint><issnElectronic>2377-5750</issnElectronic><keywords/><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-03-01</publishedDate><doi>10.1109/pdp52278.2021.00028</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2021-11-24T16:34:45.5606657</lastEdited><Created>2021-03-16T09:57:33.4741858</Created><path><level id="1">College of Science</level><level id="2">College of Science</level></path><authors><author><firstname>Matthew</firstname><surname>Naylor</surname><order>1</order></author><author><firstname>Simon W.</firstname><surname>Moore</surname><order>2</order></author><author><firstname>David</firstname><surname>Thomas</surname><order>3</order></author><author><firstname>Jonathan R.</firstname><surname>Beaumont</surname><order>4</order></author><author><firstname>Shane</firstname><surname>Fleming</surname><order>5</order></author><author><firstname>Mark</firstname><surname>Vousden</surname><order>6</order></author><author><firstname>A. Theodore</firstname><surname>Markettos</surname><order>7</order></author><author><firstname>Thomas</firstname><surname>Bytheway</surname><order>8</order></author><author><firstname>Andrew</firstname><surname>Brown</surname><order>9</order></author></authors><documents><document><filename>56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf</filename><originalFilename>pdp2021-mcast-draft.pdf</originalFilename><uploaded>2021-03-16T10:01:48.8252523</uploaded><type>Output</type><contentLength>235400</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2021-11-24T16:34:45.5606657 v2 56452 2021-03-16 General hardware multicasting for fine-grained message-passing architectures fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-03-16 MACS Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. Conference Paper/Proceeding/Abstract 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 29th 126 133 IEEE 9781665414555 1066-6192 2377-5750 1 3 2021 2021-03-01 10.1109/pdp52278.2021.00028 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University 2021-11-24T16:34:45.5606657 2021-03-16T09:57:33.4741858 College of Science College of Science Matthew Naylor 1 Simon W. Moore 2 David Thomas 3 Jonathan R. Beaumont 4 Shane Fleming 5 Mark Vousden 6 A. Theodore Markettos 7 Thomas Bytheway 8 Andrew Brown 9 56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf pdp2021-mcast-draft.pdf 2021-03-16T10:01:48.8252523 Output 235400 application/pdf Accepted Manuscript true true eng |
title |
General hardware multicasting for fine-grained message-passing architectures |
spellingShingle |
General hardware multicasting for fine-grained message-passing architectures Shane Fleming |
title_short |
General hardware multicasting for fine-grained message-passing architectures |
title_full |
General hardware multicasting for fine-grained message-passing architectures |
title_fullStr |
General hardware multicasting for fine-grained message-passing architectures |
title_full_unstemmed |
General hardware multicasting for fine-grained message-passing architectures |
title_sort |
General hardware multicasting for fine-grained message-passing architectures |
author_id_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95 |
author_id_fullname_str_mv |
fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming |
author |
Shane Fleming |
author2 |
Matthew Naylor Simon W. Moore David Thomas Jonathan R. Beaumont Shane Fleming Mark Vousden A. Theodore Markettos Thomas Bytheway Andrew Brown |
format |
Conference Paper/Proceeding/Abstract |
container_title |
2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) |
container_volume |
29th |
container_start_page |
126 |
publishDate |
2021 |
institution |
Swansea University |
isbn |
9781665414555 |
issn |
1066-6192 2377-5750 |
doi_str_mv |
10.1109/pdp52278.2021.00028 |
publisher |
IEEE |
college_str |
College of Science |
hierarchytype |
|
hierarchy_top_id |
collegeofscience |
hierarchy_top_title |
College of Science |
hierarchy_parent_id |
collegeofscience |
hierarchy_parent_title |
College of Science |
department_str |
College of Science{{{_:::_}}}College of Science{{{_:::_}}}College of Science |
document_store_str |
1 |
active_str |
0 |
description |
Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. |
published_date |
2021-03-01T07:47:25Z |
_version_ |
1827279623368474624 |
score |
11.05492 |