In several places on the web we can find explanations on how to size a neural network, how many hidden layers should be added, how many neurons should compose each hidden layers… Some methods as learning curves can help to decide in which way we should modify networks to avoid over-fitting or to reduce the bias. But finally, we don’t really understand what is really the role of each neuron in the network.
After reading this article, you will be able to (for a classification problem involving 2 classes in a 2D dataset):
- Understand what exactly does each neuron,
- Size the minimal neural network which fits your data…
…and that without any mathematical equation.
It also means that, if you remove a single neuron on this network, it won’t be able to fit the dataset. Wonderful, isn’t it?
For whom is this post suitable?
If you already have some knowledge in neural networks and if you really want to know what happens internally, this article is perfectly suited for you!
If you know nothing (John Snow) about neural nets, it’s fine too. You will understand this post easily too.
But before, let’s play a little bit!
To help me in my task, I will widely use the wonderful tool TensorFlow Playground. If you don’t know it, I invite you to play with it a little bit. Try to change the dataset, to add some hidden layers, to add some neurons in each layers, to modify the activation function… And see what happens!
Step 1: Linearly separable dataset
On the playground, select the only linearly separable dataset.
In simple words, in our 2D case, this means that the two classes could be separated with a line.
Oh, in our case, each class is represented by a color. We have 2 colors (orange and blue), so we have two classes.
The purpose of our neural network is to separate in the best way these two classes.
So… click here:
Then remove all the hidden layers by clicking here :
Then click on the Play button.
After a few seconds, you should have something like that:
The blue zone covers the blue points, and the orange zone covers the orange points.
It seems that our model is perfectly able to classify our training set!
Smart question #1: ‘Hey, in a lot of video and articles, one talk about very complexes networks with a lot of hidden layers and a lot of hidden neurons in each layer. Here, we have only 2 neurons as an input, one neuron as an output and no hidden layer at all, and the network seems to work perfectly. Why?’
Let’s analyse this case right now.
The top input neuron, x1, represents the horizontal axis. This neuron is able to divide horizontally the space in two sub spaces. The limit between these two sub spaces is a vertical line. This horizontal sub spaces division is represented on the playground by this little icon:
Respectively, the bottom input neuron, x2, represents the vertical axis. This neuron is able to divide vertically the space in two sub spaces. The limit between these two sub spaces is a horizontal line. This vertical sub spaces division is represented by the playground on this little icon:
The output neuron does only one thing (and does it well): It mixes the output of x1 and x2 neurons. And what do you get if you mix a horizontal line with a vertical line ?
A diagonal line, like that:
Depending on the dataset, the output neuron will give more weight to the output of x1 neuron, so the separation line will be more vertical, or to the output of x2 neuron, so the separation line will be more horizontal. If the output neuron gives exactly the same weight to the output of x1 neuron and to the output of x2 neuron, then the output separation line will be perfectly diagonal.
In order to understand a little bit better this notion of “mixing”, let’s use another amazing tool: Google Sheet.
On the following figures are represented x1 and x2.
Well, I said I won’t write any equation in this post, but let me write a (small) one.
I promise, it will be the only one.
If we simplify a lot: output neuron = C1 * x1 + C2 * x2.
C1 and C2 are computed by the neural network itself during its training phase.
All the “intelligence” of the network is in these weights.
If C1 = C2 = 1, we are in the following situation: output neuron = x1 + x2.
If the weight associated to x1 is higher than the one associated to x2 – let’s say C1 = 2 and C2 = 1 – we are in the following situation: output neuron = 2 * x1 + x2.
The delimitation line is more vertical.
Weights could also be negative, like here (C1 = -2 and C2 = 1):
In this case, the delimitation line goes in the opposite direction.
You have now a rough idea of how the output neuron is able to mix the output of the neurons of the input layer.
(Advanced people will notice that I simplified a lot and I forgot the bias and the activation function. It is done on purpose, not complicate too much this post.)
Smart question #2: ‘If we simplify to the extreme our neural net by setting only one input neuron, for example x1 and nothing else, then the output neuron will mix the output of x1 neuron with… nothing. So the separation line will be a vertical line, doesn’t it?’
Yes, you’re right! Here is the visual proof:
(Note for advanced people: Actually a neural network with no hidden layer and with only one output neuron is equivalent to logistic regression.)
We just visually saw that our neural network with x1 and x2 as an input and no hidden layer is able to separate the space in two sub spaces, and the separation between these two sub spaces is a line.
Here, our network works well because the two classes of our dataset could fortunately be splitted with a line. As written earlier, we say here that we have a linearly separable dataset.
Smart question #3: ‘But what happens if we try our network on a non-linearly separable dataset, like this one: ?‘
Step 2: Non linearly separable dataset
With the following dataset,
we could expect that the best shape to separate the blue and orange classes is a circle, like on the following picture:
How will behave our previous network if we ask it to separate those two classes ?
Well, it’s an epic fail… Our network tries to separate these two classes with a single (straight) line. It tries to do its best, but the result is not good at all.
Now let me ask a question: How many minimum (straight) lines are needed to separate blue points from orange points?
The answer is (drum roll again)… three ! And these three lines will shape a… triangle, like here:
Well, it’s not perfect because some orange points are inside the triangle. Anyway, the result is not so bad. How could do our neural network to draw such a triangle?
Now begins the interesting part of this post. We will introduce a hidden layer in the network.
We need our network to be able to create three separation lines to shape a triangle, so we will add three neurons in the hidden layer, like that:
What is the purpose of each of these hidden neurons ?
Each neuron of the hidden layer is able to mix the output of all the neurons of the input layer. (In our case we have two neurons in the input layer.) So, each hidden neuron is able to separate the space in two sub spaces with a line.
Actually, each neuron of the hidden layers behaves exactly like the output neuron of our first neural network (the one of the Step 1 part).
Finally, as each neuron of the hidden layer is able to mix the output of the neurons of the input layer, the neuron of the output layer is able to mix the output of the three neurons of the hidden layer.
Let’s click on the “play” button and see what happens :
Bazinga! Our network behaves as predicted. To have a better view of the role of each hidden neuron in the network, you can hover the mouse on them in the playground. The output of the hovered neuron will be displayed on the right big square, in place of the output neuron.
Smart question #4: ‘So, if we have a hidden layer with six neurons, the output neuron will shape a six-side polygon?’
You’re more or less right. More exactly, the output neuron will shape a six-edges maximum polygon. (So, if a triangle shape is enough, there is no guarantee that the output neuron shapes a 6-edges polygon. After all, the purpose of the network is not to draw geometrical shapes, but to separate two classes as best as it can.)
With six hidden neurons, we got this output shape (with six edges, you can count by yourself.)
Now, you should have a better understanding of the role of each neuron in a neural network.
Of course, this post covers only very simple cases, but it is not so hard to try to imagine what happens in a 3D situation (with x1, x2 and x3 as an input): Each neuron is still able to split the space in two sub spaces, so neurons just after the input layer won’t separate the two sub spaces with a line, but with a plane. In four dimensions, it’s more tough to imagine…
Smart question #5: ‘Wait a little bit: Your network has only one hidden layer. So why we sometimes need a lot of hidden layers to be able to classify images like cats and dogs?’
Good question! Actually, it is mathematically proven that every function in the world (including functions which try to separate blue points from orange points …) is computable thanks to a neural network with only one hidden layer. If you want, you can read this very well written article on this topic.
So why do we sometimes need several hidden layers?
Because single hidden layer neural nets are sometimes hard to train, especially when there is a lot of neurons in the hidden layer. Even if these kind of networks are mathematically possible to train, they require a lot of computational power and a lot of time. That’s why some techniques like deep learning have been developed.
And to conclude this post, let me give you some useful links:
- A very nice book on Neural Network and Deep Learning, easy to read and written by Michael Nielsen.
- THE Mooc to do if you want to understand basics of Machine Learning: CS229 by Andrew Ng.
- THE Mooc to do if you want to understand basics of Deep Learning: Fast AI by Jeremy Howard and Rachel Thomas.