Notes on “Deep Sparse Rectifier Neural Networks, Yoshua Bengio et al. (2011)”


This paper is totally empirical. You can hardly find any mathematics analysis. Therefore, every conclusion we drew is from experimental results.

The main gaps between neuroscience models and ML models

  1. Neurons encode information in a sparse and distributed way. This corresponds to a trade-off between the richness of representation and small action potential energy expenditure. However, without additional regularization, ordinary feed-forward neural networks do not have this property. This is not only not biologically implausible and hurts gradient-based optimization.
  2. Sigmoid and tanh are equivalent up to a linear transformation(do some tiny changes to the axis, we can get another one). the tanh has a steady state at 0 and is therefore preferred from the optimization standpoint, but it forces an antisymmetry around 0, which is absent in biological neurons.

Potential problems of rectifiers

  1. The hard saturation at 0 may hurt optimization by blocking gradient back-propagation.
  2. In order to efficiently represent symmetric or antisymmetric behaviour in the data, a rectifier network would need twice as many hidden units as a network of symmetric or antisymmetric activation functions.
  3. Rectifier networks are subject to ill-conditioning of parametrization.
    • Consider for each layer of depth $i$ of the network a scalar $\alpha_i$, and scaling the prameters as $w_i’=\frac{w_i}{\alpha_i}$ and $b_i’=\frac{b_i}{\prod_{j=1}^{i}{\alpha_j}}$. The output units values then change as follow: $s’=\frac{s}{\prod_{j=1}^{i}{\alpha_j}}$
    • Therefore, as long as $\prod_{j=1}^{n}{\alpha_j}=1$, the network function is identical.

Advantage of rectifiers

  1. Rectifiers are not only biologically plausible, they are also computationally efficient.
  2. There’s almost no improvement when using unsupervised pre-training with rectifier activations.
  3. Networks trained with the rectifier activation functions can find local minima of greater or equality than those obtained with its smooth counterpart, the soft plush.
  4. Rectifier networks are truly sparse.

Leave a Reply

Your email address will not be published. Required fields are marked *