Neural Scaling Laws
I try to learn and explain neural scaling law.
Simon Li
The work identifies four neural scaling regimes. With respect to D, dataset size, and P, number of model parameters; and with respect to variance-limited regime and resolution-limited regime.
Variance-limited regime is when
- one of D or P is fixed
- study the scaling of the other paremeter to infinitely large
- loss scales as $1 / x$, where
- for deep networks, $x = D$ or $x = \sqrt{P} \propto$ width
- for linear models, $x = D$ or $x = P$
while resolution-limited regime is when
- one of D or P is infinite
- study the scaling of as the other parameter increases
- loss scales as $1 / x^\alpha$, with $0 < \alpha < 1$ for both $x = D$ and $x = P$