Data-driven Weight Initialization with Sylvester Solvers

Debasmit Das, Yash Bhalgat, Fatih Porikli

https://arxiv.org/abs/2105.10335 (or view on SciRate)

Main idea

We propose a data-driven scheme to initialize the parameters of a neural network. The initialization is cast as an optimization problem, which is restructured into the well-known Sylvester equation that has fast and efficient gradient-free solutions. We show that our proposed method is especially effective in few-shot and fine-tuning settings.

Background

Deep neural networks have produced state-of-the-art recognition performance in areas like computer vision (Szegedy et al., 2015; He et al., 2016), natural language processing (Devlin et al., 2018; Brown et al., 2020), speech recognition (Nassif et al., 2019), etc. The success of these deep neural network models has been mostly attributed to the quality and quantity of datasets, complex architectures and algorithms and advanced computing resources. However, lesser credit has been attributed to developing novel and effective initialization schemes for these deep and complex architectures.

Results

Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of training and after training is over. We show that our proposed method is especially effective in few-shot and fine-tuning settings. We conclude this paper with analyses on time complexity and the effect of different latent codes on the recognition performance.