Multilevel Perceptron
As noted in lecture, there are two ingredients for a Multi-level perceptron: multiple levels and non-linear activation functions.
Let me demonstrate why the nonlinear activation function is important. Let’s assume that there is a linear activation function:
\[ o_i = f(u_i) = \sum_j w_{ij}x_{ij} \]
Then for any MLP, there exists an equivalent single layer perceptron. For example,
we can write out the forward pass as:
\[ o = w_{o1} (x_1 w_{11} + x_2 w_{12}) + w_{o1} (x_1 w_{21} + x_2 w_{22}) \]
and rewrite this pulling out the input terms:
\[ o = x_1 (w_{11} w_{o1} + w_{21} w_{o1}) + x_2 (w_{22} w_{o1} + w_{12} w_{o1}) \]
which is equivalent to this perceptron:
where \[ v_{o1} = w_{11} w_{o1} + w_{21} w_{o1} \] and \[ v_{o2} = w_{22} w_{o1} + w_{12} w_{o1} \] .