No Cover Image

Journal article 123 views 10 downloads

Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks

Chanju Park, Biagio Lucini, Gert Aarts Orcid Logo

Machine Learning: Science and Technology, Volume: 6, Issue: 4, Start page: 045048

Swansea University Authors: Chanju Park, Gert Aarts Orcid Logo

  • 70965.VoR.pdf

    PDF | Version of Record

    © 2025 The Author(s). Released under the terms of the Creative Commons Attribution 4.0 licence.

    Download (931.25KB)

Abstract

Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network’s phase diagram, in which each phase is charac...

Full description

Published in: Machine Learning: Science and Technology
ISSN: 2632-2153
Published: IOP Publishing 2025
Online Access: Check full text

URI: https://cronfa.swan.ac.uk/Record/cronfa70965
Abstract: Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network’s phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.
Keywords: multilayer perceptron, phase diagram, stochastic gradient descent, dyson brownian motion, hyperparameters
College: Faculty of Science and Engineering
Funders: CP is further supported by the UKRI AIMLAC CDT EP/S023992/1. GA and BL are supported by STFC Consolidated Grant ST/T000813/1. BL is further supported by the UKRI EPSRC ExCALIBUR ExaTEPP Project EP/X017168/1. We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund(ERDF) via Welsh Government.
Issue: 4
Start Page: 045048