No Cover Image

Conference Paper/Proceeding/Abstract 705 views 176 downloads

General hardware multicasting for fine-grained message-passing architectures

Matthew Naylor, Simon W. Moore, David Thomas, Jonathan R. Beaumont, Shane Fleming, Mark Vousden, A. Theodore Markettos, Thomas Bytheway, Andrew Brown

2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Volume: 29th, Pages: 126 - 133

Swansea University Author: Shane Fleming

Abstract

Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patter...

Full description

Published in: 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
ISBN: 9781665414555
ISSN: 1066-6192 2377-5750
Published: IEEE 2021
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa56452
Tags: Add Tag
No Tags, Be the first to tag this record!
first_indexed 2021-03-16T10:02:35Z
last_indexed 2021-11-25T04:14:12Z
id cronfa56452
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2021-11-24T16:34:45.5606657</datestamp><bib-version>v2</bib-version><id>56452</id><entry>2021-03-16</entry><title>General hardware multicasting for fine-grained message-passing architectures</title><swanseaauthors><author><sid>fe23ad3ebacc194b4f4c480fdde55b95</sid><firstname>Shane</firstname><surname>Fleming</surname><name>Shane Fleming</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2021-03-16</date><deptcode>FGSEN</deptcode><abstract>Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.</abstract><type>Conference Paper/Proceeding/Abstract</type><journal>2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)</journal><volume>29th</volume><journalNumber/><paginationStart>126</paginationStart><paginationEnd>133</paginationEnd><publisher>IEEE</publisher><placeOfPublication/><isbnPrint/><isbnElectronic>9781665414555</isbnElectronic><issnPrint>1066-6192</issnPrint><issnElectronic>2377-5750</issnElectronic><keywords/><publishedDay>1</publishedDay><publishedMonth>3</publishedMonth><publishedYear>2021</publishedYear><publishedDate>2021-03-01</publishedDate><doi>10.1109/pdp52278.2021.00028</doi><url/><notes/><college>COLLEGE NANME</college><department>Science and Engineering - Faculty</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>FGSEN</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2021-11-24T16:34:45.5606657</lastEdited><Created>2021-03-16T09:57:33.4741858</Created><path><level id="1">College of Science</level><level id="2">College of Science</level></path><authors><author><firstname>Matthew</firstname><surname>Naylor</surname><order>1</order></author><author><firstname>Simon W.</firstname><surname>Moore</surname><order>2</order></author><author><firstname>David</firstname><surname>Thomas</surname><order>3</order></author><author><firstname>Jonathan R.</firstname><surname>Beaumont</surname><order>4</order></author><author><firstname>Shane</firstname><surname>Fleming</surname><order>5</order></author><author><firstname>Mark</firstname><surname>Vousden</surname><order>6</order></author><author><firstname>A. Theodore</firstname><surname>Markettos</surname><order>7</order></author><author><firstname>Thomas</firstname><surname>Bytheway</surname><order>8</order></author><author><firstname>Andrew</firstname><surname>Brown</surname><order>9</order></author></authors><documents><document><filename>56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf</filename><originalFilename>pdp2021-mcast-draft.pdf</originalFilename><uploaded>2021-03-16T10:01:48.8252523</uploaded><type>Output</type><contentLength>235400</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2021-11-24T16:34:45.5606657 v2 56452 2021-03-16 General hardware multicasting for fine-grained message-passing architectures fe23ad3ebacc194b4f4c480fdde55b95 Shane Fleming Shane Fleming true false 2021-03-16 FGSEN Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model. Conference Paper/Proceeding/Abstract 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 29th 126 133 IEEE 9781665414555 1066-6192 2377-5750 1 3 2021 2021-03-01 10.1109/pdp52278.2021.00028 COLLEGE NANME Science and Engineering - Faculty COLLEGE CODE FGSEN Swansea University 2021-11-24T16:34:45.5606657 2021-03-16T09:57:33.4741858 College of Science College of Science Matthew Naylor 1 Simon W. Moore 2 David Thomas 3 Jonathan R. Beaumont 4 Shane Fleming 5 Mark Vousden 6 A. Theodore Markettos 7 Thomas Bytheway 8 Andrew Brown 9 56452__19490__b5dc3b7b98bd4556a097a04cab8d08ce.pdf pdp2021-mcast-draft.pdf 2021-03-16T10:01:48.8252523 Output 235400 application/pdf Accepted Manuscript true true eng
title General hardware multicasting for fine-grained message-passing architectures
spellingShingle General hardware multicasting for fine-grained message-passing architectures
Shane Fleming
title_short General hardware multicasting for fine-grained message-passing architectures
title_full General hardware multicasting for fine-grained message-passing architectures
title_fullStr General hardware multicasting for fine-grained message-passing architectures
title_full_unstemmed General hardware multicasting for fine-grained message-passing architectures
title_sort General hardware multicasting for fine-grained message-passing architectures
author_id_str_mv fe23ad3ebacc194b4f4c480fdde55b95
author_id_fullname_str_mv fe23ad3ebacc194b4f4c480fdde55b95_***_Shane Fleming
author Shane Fleming
author2 Matthew Naylor
Simon W. Moore
David Thomas
Jonathan R. Beaumont
Shane Fleming
Mark Vousden
A. Theodore Markettos
Thomas Bytheway
Andrew Brown
format Conference Paper/Proceeding/Abstract
container_title 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)
container_volume 29th
container_start_page 126
publishDate 2021
institution Swansea University
isbn 9781665414555
issn 1066-6192
2377-5750
doi_str_mv 10.1109/pdp52278.2021.00028
publisher IEEE
college_str College of Science
hierarchytype
hierarchy_top_id collegeofscience
hierarchy_top_title College of Science
hierarchy_parent_id collegeofscience
hierarchy_parent_title College of Science
department_str College of Science{{{_:::_}}}College of Science{{{_:::_}}}College of Science
document_store_str 1
active_str 0
description Manycore architectures are increasingly favouring message-passing or partitioned global address spaces (PGAS) over cache coherency for reasons of power efficiency and scalability. However, in the absence of cache coherency, there can be a lack of hardware support for one-to-many communication patterns, which are prevalent in someapplication domains. To address this, we present new hardware primitives for multicast communication in rack-scale manycore systems. These primitives guarantee delivery to both colocated and distributed destinations, and can capture large unstructured communication patterns precisely. As a result, reliable multicast transfers among any number of software tasks, connected in any topology, can be fully offloaded to hardware. We implement the new primitives in a research platform consisting of 50K RISC-V threads distributed over 48 FPGAs, and demonstrate significant performance benefits on a range of applications expressed using a high-level vertex-centric programming model.
published_date 2021-03-01T04:11:25Z
_version_ 1763753785544409088
score 11.013686