Next: Up: Previous:

Convergence of Backpropagation

Gradient descent to some local minimum

Perhaps not global minimum...
Add momentum
Stochastic gradient descent
Train multiple nets with different inital weights

Nature of convergence

Initialize weights near zero
Therefore, initial networks near-linear
Increasingly non-linear functions possible as training progresses