Multilevel Perceptron

As noted in lecture, there are two ingredients for a Multi-level perceptron: multiple levels and non-linear activation functions.

Let me demonstrate why the nonlinear activation function is important. Let’s assume that there is a linear activation function:

\[ o_i = f(u_i) = \sum_j w_{ij}x_{ij} \]

Then for any MLP, there exists an equivalent single layer perceptron. For example,

alt text here

we can write out the forward pass as:

\[ o = w_{o1} (x_1 w_{11} + x_2 w_{12}) + w_{o1} (x_1 w_{21} + x_2 w_{22}) \] and rewrite this pulling out the input terms: \[ o = x_1 (w_{11} w_{o1} + w_{21} w_{o1}) + x_2 (w_{22} w_{o1} + w_{12} w_{o1}) \] which is equivalent to this perceptron:

alt text here

where \[ v_{o1} = w_{11} w_{o1} + w_{21} w_{o1} \] and \[ v_{o2} = w_{22} w_{o1} + w_{12} w_{o1} \] .