SMS scnews item created by Caroline Wormell at Sun 15 Feb 2026 0127
Type: Seminar
Distribution: World
Expiry: 25 Feb 2026
Calendar1: 25 Feb 2026 1200-1300
CalLoc1: Carslaw 451
CalTitle1: Choi: Optimized weight initialization on the Stiefel manifold for deep ReLU neural networks
Auth: caro@217.217.121.33 (cwor5378) in SMS-SAML

Applied Maths Seminar: Choi -- Optimized weight initialization on the Stiefel manifold for deep ReLU neural networks

Hayoung Choi (Kyungpook National University) will be visiting and will give a talk on
Wednesday 25th February at 12pm in Carslaw 451.  This will be followed by lunch, all
welcome and students get a free lunch.  

Title: Optimized weight initialization on the Stiefel manifold for deep ReLU neural
networks 

Abstract: Deep learning has achieved remarkable success in computer vision, natural
language processing, and scientific data analysis, primarily due to its ability to
extract hierarchical representations from data.  At the heart of training deep neural
networks lies gradient descent, whose effectiveness depends crucially on how model
parameters are initialized.  Classical initialization strategies such as Xavier, He, and
orthogonal initialization aim to preserve variance or approximate isometry, and they
have enabled significant progress in stabilizing training.  However, as network depth
increases, these schemes often fail to prevent neuron inactivation ("dying ReLU") and
suffer from instability of activations and gradients.  In this talk, we will first
introduce the key ideas behind deep learning and gradient descent, then provide an
overview of standard initialization methods and their limitations.  I will then present
recent joint work on optimized weight initialization on the Stiefel manifold for deep
ReLU networks.  By formulating an optimization problem on the Stiefel manifold, we
derive an orthogonal initialization that not only preserves scale but also calibrates
pre-activation statistics at the outset.  A family of closed-form solutions and an
efficient sampling scheme are established.  Theoretical analysis demonstrates prevention
of the dying ReLU problem, slower variance decay, and mitigation of gradient vanishing,
ensuring more stable signal propagation.  Empirical studies on image benchmarks, tabular
data, and few-shot settings show that the proposed method consistently outperforms
existing initializations and enables reliable training in very deep architectures.