No Cover Image

Journal article 589 views 33 downloads

Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning

Junxiu Liu, Sheng Qin, Min Su, Yuling Luo Orcid Logo, Yanhu Wang, Scott Yang Orcid Logo

Information Sciences, Volume: 647, Start page: 119484

Swansea University Author: Scott Yang Orcid Logo

Abstract

For the multi-agent traffic signal controls, the traffic signal at each intersection is controlled by an independent agent. Since the control policy for each agent is dynamic, when the traffic scale is large, the adjustment of the agent's policy brings non-stationary effects over surrounding in...

Full description

Published in: Information Sciences
ISSN: 0020-0255
Published: Elsevier BV 2023
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa64123
first_indexed 2023-08-24T08:41:13Z
last_indexed 2024-11-25T14:13:28Z
id cronfa64123
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2024-09-05T12:03:56.2013822</datestamp><bib-version>v2</bib-version><id>64123</id><entry>2023-08-24</entry><title>Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning</title><swanseaauthors><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-08-24</date><deptcode>MACS</deptcode><abstract>For the multi-agent traffic signal controls, the traffic signal at each intersection is controlled by an independent agent. Since the control policy for each agent is dynamic, when the traffic scale is large, the adjustment of the agent's policy brings non-stationary effects over surrounding intersections, leading to the instability of the overall system. Therefore, there is the necessity to eliminate this non-stationarity effect to stabilize the multi-agent system. A collaborative multi-agent reinforcement learning method is proposed in this work to enable the system to overcome the instability problem through a collaborative mechanism. Decentralized learning with limited communication is used to reduce the communication latency between agents. The Shapley value reward function is applied to comprehensively calculate the contribution of each agent to avoid the influence of reward function coefficient variation, thereby reducing unstable factors. The Kullback-Leibler divergence is then used to distinguish the current and historical policies, and the loss function is optimized to eliminate the environmental non-stationarity. Experimental results demonstrate that the average travel time and its standard deviation are reduced by using the Shapley value reward function and optimized loss function, respectively, and this work provides an alternative for traffic signal controls on multiple intersections.</abstract><type>Journal Article</type><journal>Information Sciences</journal><volume>647</volume><journalNumber/><paginationStart>119484</paginationStart><paginationEnd/><publisher>Elsevier BV</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0020-0255</issnPrint><issnElectronic/><keywords>Traffic signal control, Reinforcement learning, Multi-agent system</keywords><publishedDay>1</publishedDay><publishedMonth>11</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-11-01</publishedDate><doi>10.1016/j.ins.2023.119484</doi><url>http://dx.doi.org/10.1016/j.ins.2023.119484</url><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>This research is supported by the National Natural Science Foundation of China under Grant 61976063, the Guangxi Natural Science Foundation under Grant 2022GXNSFFA035028, research fund of Guangxi Normal University under Grant 2021JC006, the AI+Education research project of Guangxi Humanities Society Science Development Research Center under Grant ZXZJ202205.</funders><projectreference/><lastEdited>2024-09-05T12:03:56.2013822</lastEdited><Created>2023-08-24T09:33:55.4014315</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Junxiu</firstname><surname>Liu</surname><order>1</order></author><author><firstname>Sheng</firstname><surname>Qin</surname><order>2</order></author><author><firstname>Min</firstname><surname>Su</surname><order>3</order></author><author><firstname>Yuling</firstname><surname>Luo</surname><orcid>0000-0002-0117-4614</orcid><order>4</order></author><author><firstname>Yanhu</firstname><surname>Wang</surname><order>5</order></author><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>6</order></author></authors><documents><document><filename>64123__28393__f268c75ba03c45f7b733701a1d49c120.pdf</filename><originalFilename>64123.pdf</originalFilename><uploaded>2023-08-29T14:46:36.0310929</uploaded><type>Output</type><contentLength>868842</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2024-08-10T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2024-09-05T12:03:56.2013822 v2 64123 2023-08-24 Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2023-08-24 MACS For the multi-agent traffic signal controls, the traffic signal at each intersection is controlled by an independent agent. Since the control policy for each agent is dynamic, when the traffic scale is large, the adjustment of the agent's policy brings non-stationary effects over surrounding intersections, leading to the instability of the overall system. Therefore, there is the necessity to eliminate this non-stationarity effect to stabilize the multi-agent system. A collaborative multi-agent reinforcement learning method is proposed in this work to enable the system to overcome the instability problem through a collaborative mechanism. Decentralized learning with limited communication is used to reduce the communication latency between agents. The Shapley value reward function is applied to comprehensively calculate the contribution of each agent to avoid the influence of reward function coefficient variation, thereby reducing unstable factors. The Kullback-Leibler divergence is then used to distinguish the current and historical policies, and the loss function is optimized to eliminate the environmental non-stationarity. Experimental results demonstrate that the average travel time and its standard deviation are reduced by using the Shapley value reward function and optimized loss function, respectively, and this work provides an alternative for traffic signal controls on multiple intersections. Journal Article Information Sciences 647 119484 Elsevier BV 0020-0255 Traffic signal control, Reinforcement learning, Multi-agent system 1 11 2023 2023-11-01 10.1016/j.ins.2023.119484 http://dx.doi.org/10.1016/j.ins.2023.119484 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University This research is supported by the National Natural Science Foundation of China under Grant 61976063, the Guangxi Natural Science Foundation under Grant 2022GXNSFFA035028, research fund of Guangxi Normal University under Grant 2021JC006, the AI+Education research project of Guangxi Humanities Society Science Development Research Center under Grant ZXZJ202205. 2024-09-05T12:03:56.2013822 2023-08-24T09:33:55.4014315 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Junxiu Liu 1 Sheng Qin 2 Min Su 3 Yuling Luo 0000-0002-0117-4614 4 Yanhu Wang 5 Scott Yang 0000-0002-6618-7483 6 64123__28393__f268c75ba03c45f7b733701a1d49c120.pdf 64123.pdf 2023-08-29T14:46:36.0310929 Output 868842 application/pdf Accepted Manuscript true 2024-08-10T00:00:00.0000000 true eng
title Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
spellingShingle Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
Scott Yang
title_short Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
title_full Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
title_fullStr Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
title_full_unstemmed Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
title_sort Multiple intersections traffic signal control based on cooperative multi-agent reinforcement learning
author_id_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b
author_id_fullname_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b_***_Scott Yang
author Scott Yang
author2 Junxiu Liu
Sheng Qin
Min Su
Yuling Luo
Yanhu Wang
Scott Yang
format Journal article
container_title Information Sciences
container_volume 647
container_start_page 119484
publishDate 2023
institution Swansea University
issn 0020-0255
doi_str_mv 10.1016/j.ins.2023.119484
publisher Elsevier BV
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
url http://dx.doi.org/10.1016/j.ins.2023.119484
document_store_str 1
active_str 0
description For the multi-agent traffic signal controls, the traffic signal at each intersection is controlled by an independent agent. Since the control policy for each agent is dynamic, when the traffic scale is large, the adjustment of the agent's policy brings non-stationary effects over surrounding intersections, leading to the instability of the overall system. Therefore, there is the necessity to eliminate this non-stationarity effect to stabilize the multi-agent system. A collaborative multi-agent reinforcement learning method is proposed in this work to enable the system to overcome the instability problem through a collaborative mechanism. Decentralized learning with limited communication is used to reduce the communication latency between agents. The Shapley value reward function is applied to comprehensively calculate the contribution of each agent to avoid the influence of reward function coefficient variation, thereby reducing unstable factors. The Kullback-Leibler divergence is then used to distinguish the current and historical policies, and the loss function is optimized to eliminate the environmental non-stationarity. Experimental results demonstrate that the average travel time and its standard deviation are reduced by using the Shapley value reward function and optimized loss function, respectively, and this work provides an alternative for traffic signal controls on multiple intersections.
published_date 2023-11-01T14:26:48Z
_version_ 1821325350773194752
score 11.564073