Next: Up: Previous:

Gradient Descent

To understand, consider simpler linear unit, where

$\begin{displaymath}o = w_{0} + w_{1}x_1 + \cdots + w_n x_n \end{displaymath}$

Let's learn w_i's that minimize the squared error

$\begin{displaymath}E[\vec{w}] \equiv \frac{1}{2}\sum_{d \in D}(t_{d} - o_{d})^{2} \end{displaymath}$

Where D is set of training examples