No Cover Image

Journal article 706 views 40 downloads

Traffic signal control using reinforcement learning based on the teacher-student framework

Junxiu Liu, Sheng Qin Orcid Logo, Min Su, Yuling Luo Orcid Logo, Shunsheng Zhang, Yanhu Wang, Scott Yang Orcid Logo

Expert Systems with Applications, Volume: 228, Start page: 120458

Swansea University Author: Scott Yang Orcid Logo

Abstract

Reinforcement Learning (RL) is an effective method for adaptive traffic signals control. As one type of RL, the teacher-student framework has been found helpful in improving the model performance for different application fields (such as robot control, game, hybrid intelligence), but it is rarely ap...

Full description

Published in: Expert Systems with Applications
ISSN: 0957-4174
Published: Elsevier BV 2023
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa63462
first_indexed 2023-05-16T09:20:48Z
last_indexed 2024-11-15T18:01:38Z
id cronfa63462
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2024-07-29T14:25:27.7114512</datestamp><bib-version>v2</bib-version><id>63462</id><entry>2023-05-16</entry><title>Traffic signal control using reinforcement learning based on the teacher-student framework</title><swanseaauthors><author><sid>81dc663ca0e68c60908d35b1d2ec3a9b</sid><ORCID>0000-0002-6618-7483</ORCID><firstname>Scott</firstname><surname>Yang</surname><name>Scott Yang</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2023-05-16</date><deptcode>MACS</deptcode><abstract>Reinforcement Learning (RL) is an effective method for adaptive traffic signals control. As one type of RL, the teacher-student framework has been found helpful in improving the model performance for different application fields (such as robot control, game, hybrid intelligence), but it is rarely applied for traffic control due to that the hyper-parameters and the number of state-action pairs experienced are difficult to determine. In this work, the teacher-student framework is used for traffic signal control, where only a single reward function is designed to guide the student agent and by using this method the number of hyper-parameters and the model complexity are reduced. Specifically, the teacher agent uses an importance function to evaluate and guide the student, where the importance function combines with environment reward to form a synthetic reward for the student agent. Experimental results under different traffic environments show that the proposed method achieves the expected performance enhancement and is better than most of the state-of-the-art RL-based traffic signal control methods.</abstract><type>Journal Article</type><journal>Expert Systems with Applications</journal><volume>228</volume><journalNumber/><paginationStart>120458</paginationStart><paginationEnd/><publisher>Elsevier BV</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint>0957-4174</issnPrint><issnElectronic/><keywords/><publishedDay>15</publishedDay><publishedMonth>10</publishedMonth><publishedYear>2023</publishedYear><publishedDate>2023-10-15</publishedDate><doi>10.1016/j.eswa.2023.120458</doi><url/><notes/><college>COLLEGE NANME</college><department>Mathematics and Computer Science School</department><CollegeCode>COLLEGE CODE</CollegeCode><DepartmentCode>MACS</DepartmentCode><institution>Swansea University</institution><apcterm/><funders>This research is supported by the National Natural Science Foundation of China under Grant 61976063, the Guangxi Natural Science Foundation under Grant 2022GXNSFFA035028, research fund of Guangxi Normal University under Grant 2021JC006, the AI+Education research project of Guangxi Humanities Society Science Development Research Center under Grant ZXZJ202205.</funders><projectreference/><lastEdited>2024-07-29T14:25:27.7114512</lastEdited><Created>2023-05-16T10:18:56.1178393</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Mathematics and Computer Science - Computer Science</level></path><authors><author><firstname>Junxiu</firstname><surname>Liu</surname><order>1</order></author><author><firstname>Sheng</firstname><surname>Qin</surname><orcid>0000-0001-7348-901x</orcid><order>2</order></author><author><firstname>Min</firstname><surname>Su</surname><order>3</order></author><author><firstname>Yuling</firstname><surname>Luo</surname><orcid>0000-0002-0117-4614</orcid><order>4</order></author><author><firstname>Shunsheng</firstname><surname>Zhang</surname><order>5</order></author><author><firstname>Yanhu</firstname><surname>Wang</surname><order>6</order></author><author><firstname>Scott</firstname><surname>Yang</surname><orcid>0000-0002-6618-7483</orcid><order>7</order></author></authors><documents><document><filename>63462__27484__46f45335580f42078d1293f83bdd79da.pdf</filename><originalFilename>63462.pdf</originalFilename><uploaded>2023-05-16T12:00:15.9943888</uploaded><type>Output</type><contentLength>685373</contentLength><contentType>application/pdf</contentType><version>Accepted Manuscript</version><cronfaStatus>true</cronfaStatus><embargoDate>2024-05-12T00:00:00.0000000</embargoDate><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807>
spelling 2024-07-29T14:25:27.7114512 v2 63462 2023-05-16 Traffic signal control using reinforcement learning based on the teacher-student framework 81dc663ca0e68c60908d35b1d2ec3a9b 0000-0002-6618-7483 Scott Yang Scott Yang true false 2023-05-16 MACS Reinforcement Learning (RL) is an effective method for adaptive traffic signals control. As one type of RL, the teacher-student framework has been found helpful in improving the model performance for different application fields (such as robot control, game, hybrid intelligence), but it is rarely applied for traffic control due to that the hyper-parameters and the number of state-action pairs experienced are difficult to determine. In this work, the teacher-student framework is used for traffic signal control, where only a single reward function is designed to guide the student agent and by using this method the number of hyper-parameters and the model complexity are reduced. Specifically, the teacher agent uses an importance function to evaluate and guide the student, where the importance function combines with environment reward to form a synthetic reward for the student agent. Experimental results under different traffic environments show that the proposed method achieves the expected performance enhancement and is better than most of the state-of-the-art RL-based traffic signal control methods. Journal Article Expert Systems with Applications 228 120458 Elsevier BV 0957-4174 15 10 2023 2023-10-15 10.1016/j.eswa.2023.120458 COLLEGE NANME Mathematics and Computer Science School COLLEGE CODE MACS Swansea University This research is supported by the National Natural Science Foundation of China under Grant 61976063, the Guangxi Natural Science Foundation under Grant 2022GXNSFFA035028, research fund of Guangxi Normal University under Grant 2021JC006, the AI+Education research project of Guangxi Humanities Society Science Development Research Center under Grant ZXZJ202205. 2024-07-29T14:25:27.7114512 2023-05-16T10:18:56.1178393 Faculty of Science and Engineering School of Mathematics and Computer Science - Computer Science Junxiu Liu 1 Sheng Qin 0000-0001-7348-901x 2 Min Su 3 Yuling Luo 0000-0002-0117-4614 4 Shunsheng Zhang 5 Yanhu Wang 6 Scott Yang 0000-0002-6618-7483 7 63462__27484__46f45335580f42078d1293f83bdd79da.pdf 63462.pdf 2023-05-16T12:00:15.9943888 Output 685373 application/pdf Accepted Manuscript true 2024-05-12T00:00:00.0000000 true eng
title Traffic signal control using reinforcement learning based on the teacher-student framework
spellingShingle Traffic signal control using reinforcement learning based on the teacher-student framework
Scott Yang
title_short Traffic signal control using reinforcement learning based on the teacher-student framework
title_full Traffic signal control using reinforcement learning based on the teacher-student framework
title_fullStr Traffic signal control using reinforcement learning based on the teacher-student framework
title_full_unstemmed Traffic signal control using reinforcement learning based on the teacher-student framework
title_sort Traffic signal control using reinforcement learning based on the teacher-student framework
author_id_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b
author_id_fullname_str_mv 81dc663ca0e68c60908d35b1d2ec3a9b_***_Scott Yang
author Scott Yang
author2 Junxiu Liu
Sheng Qin
Min Su
Yuling Luo
Shunsheng Zhang
Yanhu Wang
Scott Yang
format Journal article
container_title Expert Systems with Applications
container_volume 228
container_start_page 120458
publishDate 2023
institution Swansea University
issn 0957-4174
doi_str_mv 10.1016/j.eswa.2023.120458
publisher Elsevier BV
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Mathematics and Computer Science - Computer Science{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Mathematics and Computer Science - Computer Science
document_store_str 1
active_str 0
description Reinforcement Learning (RL) is an effective method for adaptive traffic signals control. As one type of RL, the teacher-student framework has been found helpful in improving the model performance for different application fields (such as robot control, game, hybrid intelligence), but it is rarely applied for traffic control due to that the hyper-parameters and the number of state-action pairs experienced are difficult to determine. In this work, the teacher-student framework is used for traffic signal control, where only a single reward function is designed to guide the student agent and by using this method the number of hyper-parameters and the model complexity are reduced. Specifically, the teacher agent uses an importance function to evaluate and guide the student, where the importance function combines with environment reward to form a synthetic reward for the student agent. Experimental results under different traffic environments show that the proposed method achieves the expected performance enhancement and is better than most of the state-of-the-art RL-based traffic signal control methods.
published_date 2023-10-15T20:22:16Z
_version_ 1821347714948923392
score 11.04748