Neural Networks
Last updated
Last updated
What is a neural network? I believe you have seen a picture like this before, perhaps many times.
Multi-layer perceptrons and neural networks are used in literature interchangeably. However, more concretely speaking, MLP is subset of neural networks. in general if you see a neural network that is feed-forward only, i.e. no cycles, then you are safe to call it a multi-layer perceptrons (MLP), otherwise they would be recurrent networks. For the purpose of introduction to neural networks, I will focus mainly on MLP.
In the simplest example, the neural network can be a linear function.
If input is a scalar, i.e. single dimensional vector, then we have the familiar equation of a line.
However, in practice, a neural network does not resemble a line because each layer has a non-linear activation function. Before we talk about non-linearity, let's focus a bit on the linear side first.
Recall that matrix multiplication represents a linear transformation of a vector. In case you need a little brush-up on linear algebra, you should take a look at 3Blue1Brown's lecture series on Essence of Linear Algebra.
Then
Then the output will be
I have rotated my horizontal vector 90 degrees and now it is a vertical vector on a Cartesian plane.
The power of neural network is flexibility; it can model any function, linear or nonlinear. Affine transformation only allows us to model linear function, we are missing a nonlinear ingredient in our neural networks to model nonlinear functions. That ingredient is nonlinear transformation, also commonly known as nonlinear activation.
A classical choice is the sigmoid activation.
However, the more popular choice for very deep neural networks is rectified linear unit, also known as ReLU.
When we apply sigmoid()
to a matrix, the operation is performed element-wise.
With affine transformation and sigmoid activation, we can create our first layer of a neural network. It's usually helpful to think each transformation/activation as a computational unit, known as gate. We can simplify the mental model by imagining that inputs pass through different gates.
Mathematically speaking,
We generally call the first layer of a neural network an input layer, subsequent layers hidden layers, and the final layer the output layer . Let N
denote the number of inputs, D
denote our input dimension, and H
denote our hidden dimension, which is the dimension of the subsequent hidden layer. Although my example is a single vector input, i.e. N = 1
but in practice you can feed in as many inputs at once as you want for maximizing parallel computation.
We can simply stack many layers together to produce an actually useful neural network. Feeding an input to a layer and propagating the output to the next layer as input is known as forward propagation. Let O
denote our output dimension.
Although our neural network can produce outputs, it does not produce any meaningful output because it is lacking training. We need to give it the ability to learn, hence the term machine learning.
Suppose you are given a set of data points, you want to fit a curve that best represents the trend of the data, how would you do it algorithmically? Here's a simple linear example.
A neural network takes inputs and produces outputs. On the high level we can think of it as a function that takes inputs and computes outputs, using a set of hidden parameters. Let denote the hidden parameters, also known as weights. Let denote our vector inputs and denote our vector outputs.
Suppose I want to apply a rotational transformation on a vector via my neural network, my would be a rotational matrix and is a zero matrix. Let me use the following notation to describe my transformation for better readability in my Python code.
I want to rotate my vector x by 90 degrees, which is in radians.
The example above is commonly referred as affine transformation. The is known as offsets or biases. The generalized form would be describe by the following expression.
Since we know the data points were generated linearly, we can simply fit a straight line to them. Recall that equation of a line is . We need a way to figure out the values of and . A naive approach is to guess.