No Cover Image

Journal article 407 views 179 downloads

Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

Lucileide M. D. Da Silva, Matheus Torquato Orcid Logo, Marcelo A. C. Fernandes

IEEE Access, Volume: 7, Pages: 2782 - 2798

Swansea University Author: Matheus Torquato Orcid Logo

Abstract

Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) f...

Full description

Published in: IEEE Access
ISSN: 2169-3536
Published: 2019
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa49022
first_indexed 2019-02-28T20:02:28Z
last_indexed 2019-03-19T19:56:38Z
id cronfa49022
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2019-03-19T15:33:15.6760076</datestamp><bib-version>v2</bib-version><id>49022</id><entry>2019-02-28</entry><title>Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA</title><swanseaauthors><author><sid>7a053c668886b4642286baed36fdba90</sid><ORCID>0000-0001-6356-3538</ORCID><firstname>Matheus</firstname><surname>Torquato</surname><name>Matheus Torquato</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2019-02-28</date><deptcode>MACS</deptcode><abstract>Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.</abstract><type>Journal Article</type><journal>IEEE Access</journal><volume>7</volume><paginationStart>2782</paginationStart><paginationEnd>2798</paginationEnd><publisher/><issnElectronic>2169-3536</issnElectronic><keywords/><publishedDay>31</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2019</publishedYear><publishedDate>2019-12-31</publishedDate><doi>10.1109/ACCESS.2018.2885950</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><lastEdited>2019-03-19T15:33:15.6760076</lastEdited><Created>2019-02-28T14:16:38.0120581</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>Lucileide M. D.</firstname><surname>Da Silva</surname><order>1</order></author><author><firstname>Matheus</firstname><surname>Torquato</surname><orcid>0000-0001-6356-3538</orcid><order>2</order></author><author><firstname>Marcelo A. C.</firstname><surname>Fernandes</surname><order>3</order></author></authors><documents><document><filename>0049022-19032019103735.pdf</filename><originalFilename>dasilva2018.pdf</originalFilename><uploaded>2019-03-19T10:37:35.2770000</uploaded><type>Output</type><contentLength>6429418</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2019-03-19T00:00:00.0000000</embargoDate><copyrightCorrect>false</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2019-03-19T15:33:15.6760076 v2 49022 2019-02-28 Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA 7a053c668886b4642286baed36fdba90 0000-0001-6356-3538 Matheus Torquato Matheus Torquato true false 2019-02-28 MACS Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. Journal Article IEEE Access 7 2782 2798 2169-3536 31 12 2019 2019-12-31 10.1109/ACCESS.2018.2885950 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University 2019-03-19T15:33:15.6760076 2019-02-28T14:16:38.0120581 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised Lucileide M. D. Da Silva 1 Matheus Torquato 0000-0001-6356-3538 2 Marcelo A. C. Fernandes 3 0049022-19032019103735.pdf dasilva2018.pdf 2019-03-19T10:37:35.2770000 Output 6429418 application/pdf Accepted Manuscript true 2019-03-19T00:00:00.0000000 false eng
title Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
spellingShingle Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
Matheus Torquato
title_short Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_full Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_fullStr Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_full_unstemmed Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
title_sort Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
author_id_str_mv 7a053c668886b4642286baed36fdba90
author_id_fullname_str_mv 7a053c668886b4642286baed36fdba90_***_Matheus Torquato
author Matheus Torquato
author2 Lucileide M. D. Da Silva
Matheus Torquato
Marcelo A. C. Fernandes
format Journal article
container_title IEEE Access
container_volume 7
container_start_page 2782
publishDate 2019
institution Swansea University
issn 2169-3536
doi_str_mv 10.1109/ACCESS.2018.2885950
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised
document_store_str 1
active_str 0
description Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
published_date 2019-12-31T01:53:57Z
_version_ 1821368581983567872
score 11.04748