No Cover Image

Journal article 123 views 10 downloads

Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks

Chanju Park, Biagio Lucini, Gert Aarts Orcid Logo

Machine Learning: Science and Technology, Volume: 6, Issue: 4, Start page: 045048

Swansea University Authors: Chanju Park, Gert Aarts Orcid Logo

  • 70965.VoR.pdf

    PDF | Version of Record

    © 2025 The Author(s). Released under the terms of the Creative Commons Attribution 4.0 licence.

    Download (931.25KB)

Abstract

Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network’s phase diagram, in which each phase is charac...

Full description

Published in: Machine Learning: Science and Technology
ISSN: 2632-2153
Published: IOP Publishing 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa70965
first_indexed 2025-11-22T22:01:17Z
last_indexed 2026-01-06T05:41:07Z
id cronfa70965
recordtype SURis
fullrecord <?xml version="1.0"?><rfc1807><datestamp>2026-01-05T16:11:12.1247457</datestamp><bib-version>v2</bib-version><id>70965</id><entry>2025-11-22</entry><title>Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks</title><swanseaauthors><author><sid>60c7e702548bc829b5cbe3040ac41e8e</sid><firstname>Chanju</firstname><surname>Park</surname><name>Chanju Park</name><active>true</active><ethesisStudent>false</ethesisStudent></author><author><sid>1ba0dad382dfe18348ec32fc65f3f3de</sid><ORCID>0000-0002-6038-3782</ORCID><firstname>Gert</firstname><surname>Aarts</surname><name>Gert Aarts</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2025-11-22</date><abstract>Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network&#x2019;s phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.</abstract><type>Journal Article</type><journal>Machine Learning: Science and Technology</journal><volume>6</volume><journalNumber>4</journalNumber><paginationStart>045048</paginationStart><paginationEnd/><publisher>IOP Publishing</publisher><placeOfPublication/><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic>2632-2153</issnElectronic><keywords>multilayer perceptron, phase diagram, stochastic gradient descent, dyson brownian motion, hyperparameters</keywords><publishedDay>30</publishedDay><publishedMonth>12</publishedMonth><publishedYear>2025</publishedYear><publishedDate>2025-12-30</publishedDate><doi>10.1088/2632-2153/ae1acc</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><apcterm>SU Library paid the OA fee (TA Institutional Deal)</apcterm><funders>CP is further supported by the UKRI AIMLAC CDT EP/S023992/1. GA and BL are supported by STFC Consolidated Grant ST/T000813/1. BL is further supported by the UKRI EPSRC ExCALIBUR ExaTEPP Project EP/X017168/1. We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund(ERDF) via Welsh Government.</funders><projectreference/><lastEdited>2026-01-05T16:11:12.1247457</lastEdited><Created>2025-11-22T16:52:10.4308586</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Biosciences, Geography and Physics - Physics</level></path><authors><author><firstname>Chanju</firstname><surname>Park</surname><order>1</order></author><author><firstname>Biagio</firstname><surname>Lucini</surname><order>2</order></author><author><firstname>Gert</firstname><surname>Aarts</surname><orcid>0000-0002-6038-3782</orcid><order>3</order></author></authors><documents><document><filename>70965__35900__247be5078cd74b45bebd43f8db6549d6.pdf</filename><originalFilename>70965.VoR.pdf</originalFilename><uploaded>2026-01-05T16:08:33.1077757</uploaded><type>Output</type><contentLength>953603</contentLength><contentType>application/pdf</contentType><version>Version of Record</version><cronfaStatus>true</cronfaStatus><documentNotes>&#xA9; 2025 The Author(s). Released under the terms of the Creative Commons Attribution 4.0 licence.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language><licence>https://creativecommons.org/licenses/by/4.0/</licence></document></documents><OutputDurs/></rfc1807>
spelling 2026-01-05T16:11:12.1247457 v2 70965 2025-11-22 Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks 60c7e702548bc829b5cbe3040ac41e8e Chanju Park Chanju Park true false 1ba0dad382dfe18348ec32fc65f3f3de 0000-0002-6038-3782 Gert Aarts Gert Aarts true false 2025-11-22 Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network’s phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser. Journal Article Machine Learning: Science and Technology 6 4 045048 IOP Publishing 2632-2153 multilayer perceptron, phase diagram, stochastic gradient descent, dyson brownian motion, hyperparameters 30 12 2025 2025-12-30 10.1088/2632-2153/ae1acc COLLEGE NANME COLLEGE CODE Swansea University SU Library paid the OA fee (TA Institutional Deal) CP is further supported by the UKRI AIMLAC CDT EP/S023992/1. GA and BL are supported by STFC Consolidated Grant ST/T000813/1. BL is further supported by the UKRI EPSRC ExCALIBUR ExaTEPP Project EP/X017168/1. We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund(ERDF) via Welsh Government. 2026-01-05T16:11:12.1247457 2025-11-22T16:52:10.4308586 Faculty of Science and Engineering School of Biosciences, Geography and Physics - Physics Chanju Park 1 Biagio Lucini 2 Gert Aarts 0000-0002-6038-3782 3 70965__35900__247be5078cd74b45bebd43f8db6549d6.pdf 70965.VoR.pdf 2026-01-05T16:08:33.1077757 Output 953603 application/pdf Version of Record true © 2025 The Author(s). Released under the terms of the Creative Commons Attribution 4.0 licence. true eng https://creativecommons.org/licenses/by/4.0/
title Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
spellingShingle Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
Chanju Park
Gert Aarts
title_short Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
title_full Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
title_fullStr Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
title_full_unstemmed Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
title_sort Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks
author_id_str_mv 60c7e702548bc829b5cbe3040ac41e8e
1ba0dad382dfe18348ec32fc65f3f3de
author_id_fullname_str_mv 60c7e702548bc829b5cbe3040ac41e8e_***_Chanju Park
1ba0dad382dfe18348ec32fc65f3f3de_***_Gert Aarts
author Chanju Park
Gert Aarts
author2 Chanju Park
Biagio Lucini
Gert Aarts
format Journal article
container_title Machine Learning: Science and Technology
container_volume 6
container_issue 4
container_start_page 045048
publishDate 2025
institution Swansea University
issn 2632-2153
doi_str_mv 10.1088/2632-2153/ae1acc
publisher IOP Publishing
college_str Faculty of Science and Engineering
hierarchytype
hierarchy_top_id facultyofscienceandengineering
hierarchy_top_title Faculty of Science and Engineering
hierarchy_parent_id facultyofscienceandengineering
hierarchy_parent_title Faculty of Science and Engineering
department_str School of Biosciences, Geography and Physics - Physics{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Biosciences, Geography and Physics - Physics
document_store_str 1
active_str 0
description Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network’s phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.
published_date 2025-12-30T05:34:05Z
_version_ 1856987039079071744
score 11.096295