A Simple Artificial Neural Network With Python
The idea behind this article is to build a neural network from scratch. We can find a pleiad of tutorials and excellent articles on the internet and textbooks. The slight differences with the other documentation lie in the methodology. I will detail as much as possible the calculous by connecting together algebra, matrice formula and Python code. This will, at least it helped me, to understand how a simple ANN works under the hood.
Let’s begin with a simple 2 inputs, 2 hidden nodes and 2 outputs networks. Our task here is to train a neural network to complete any classification task. Our ultimate goal is to find the best weights, i.e the slope between nodes, that minimises the total error. For that, the learning method is split into 2 propagations :
An artificial Neural network is similar to a biological network. Our brain emits a signal to different connected node until a final decision is made. Each node receives an electrical signal from all transmitting nodes. If the signal is too small then it won’t be emitted in the next node. The brain uses only the most relevant signals to propagate around the network. For ANN, the logic applies in a similar fashion. The network takes an input, sends it to all connected nodes and computes the signal with an activation function. Only the signals above a threshold will go to the next layer.
Figure 1 plots this idea. The neuron is decomposed into the input part and the activation function. The left part receives all the input from the previous layer. The right part is the sum of the input passes into an activation function. In this article, we will use the sigmoid function due to the flexible mathematical properties.
Our network looks like picture 2. The first layer is the input values , the second layer, called hidden layer, receives the weighted input from the previous layer. We use notation for the link between layer 1 and layer 2. For instance, the node receives signal from the weighted input and . The third layer links the hidden layer and the output layer, written as relationship. From figure 3, we can see the weight from the first node of the hidden layer connecting the first node of the output layer is
For simplicity, let’s write down all the weights. Besides, we don’t include a bias on purpose.
|Input layers ii||Hidden layers wij||Output layers wjk||Output ok|
|i1 = .05||w11 = .15||w21 = .25||w11 = .40||w11 = .50||o1 = .01|
|i2 = .10||w12 = .20||w22 = .30||w11 = .45||w11 = .55||o2 = .99|
In order to train our neural network with Python, we need to follow the methodology we have from above. We will write three functions. We follow the books from Tariq Rashid.
initialisation : To set the number of input, hidden and output nodes
train : refine the wieghts after being given a training set example to learn from
query : give an answer from the output nodes after being given an input
# neural network class definition class neuralNetwork: # initialise the neural network def __init__() : pass # train the neural network def train() : pass # query the neural network def query() :
We can initialize the network with the following code. We need to import numpy library and a special scipy. We want to build a neural network without using Keras or TensorFlow library because we aim to understand what is going on behind the code.
import numpy as np import scipy.special class neuralNetwork : def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate): # set number of nodes in each input, hidden, output layer self.inodes = inputnodes self.hnodes = hiddennodes self.onodes = outputnodes # link weight matrices, wih and who self.wih = np.matrix([[0.15,0.2], [0.25,0.30]]) self.who = np.matrix([[0.40,0.45], [0.50,0.55]]) # learning rate self.lr = learningrate # activation function is the sigmoid function self.activation_function = lambda x: scipy.special.expit(x) pass
Our neural network looks like picture 3. We are ready to move to the forward propagation.
The first step of the neural network train starts by computing the outputs. We begin with the computation of the hidden output, . Then, we calculate the output generated by the network that will be compared with the actual output. The third layer’s outputs are generated from the output of the hidden layer and plug it into the activation function, i.e sigmoid function.
We split the computation into three distinct parts. The variable represents the left part of the node, which is only the sum of the weighted input. When we have two nodes, the computation is not heavy and can be easily dealt by a human. However, if we connect hundreds of nodes together, it is time-consuming and prone to error. Instead of algebra, we can rewrite the equation in a matrice form so that any computer program can understand. In our case, we use Python numpy library. The matrice formula is simply which multiply the weight by the input vector. Note that the means multiply matrices. The third column details the input/weight in a matrice form.
We can plug the sum of the weighted input into the activation function.
The Python code used to produce the output comes directly from the query function. We can do that because we have initialized the weight before.
We need to transpose the input matrix. This is easily done with the command. Numpy has an easy way to produce a calculation. It takes two arguments, which correspond to matrix A and matrix B. We wrote the matrix formula in column two. The Python script is simply . The final output is calculated using the declared before.
def query(self, inputs_list) : inputs = np.array(inputs_list, ndmin=2).T # calculate signals into hidden layer hidden_inputs = np.dot(self.wih, inputs) # calculate the signals emerging from hidden layer hidden_outputs = self.activation_function(hidden_inputs) # calculate signals into final output layer final_inputs = np.dot(self.who, hidden_outputs) # calculate the signals emerging from final output layer final_outputs = self.activation_function(final_inputs) return final_outputs
We can turn to the final step of the forward propagation. We know the weight of the connected nodes and the output of the previous layers.
Let’s plug everything into the activation function. The final outputs are, :
We can try if it works using the following command :
# number of input, hidden and output nodes input_nodes = 2 hidden_nodes = 2 output_nodes = 2 # learning rate is 0.3 learning_rate = 1 # create instance of neural network n = neuralNetwork(input_nodes,hidden_nodes,output_nodes, learning_rate) n.query([0.05, 0.1])
You should see a matrix with 0.6011 and 0.6335 like in our example.
That’s it. We have computed the output, we can know use the error to update the weights of our network.
The backward propagation aims to minimize the error obtain from each layer to update the weights. We are interested by how much the error change after a change in the weights. Similar to the forward propagation, we need to compute separately the connection between each layer. We begin with updating the weights from the output layer and the hidden layer by minimizing the total error of our model. Our task is to use the gradient descent function to find the weights that minimize the error function. We need to find the derivative of the error function to see the sign of the slope. A positive slope means, we need to move in the opposite direction, we decrease the weight to find the function minimum. The error of the function is .
The error function we are trying to solve is the sum of the differences between the target and actual values squared, summed over the weight wjk linked to ok. We turn to compute the derivative respect to the wjk. For the details of the derivative, you can check this website.
The last line describes the slope of the error function.
We can solve the function to see how much the weight should change to minimize the error.
The error is, :
The last step involves updating the weights. This is easily be done by taking the old weight and sum with the gradient. We usually multiply the second term with a learning rate to moderate the strength of this change to make sure the algorithm find a minimum. For the purpose of this article, suppose the learning rate is equal to 1.
The updated weights are :
We need to write our last function in order to train our network. Let’s write the code in two-step. Firstly, we start with updating the output weight. Secondly, we update the hidden layer’s weights. We will see how to update those weights shortly.
We need 3 matrix.
def train(self, inputs_list, targets_list) : # convert inputs list to 2d array inputs = np.array(inputs_list, ndmin=2).T targets = np.array(targets_list, ndmin=2).T # calculate signals into hidden layer hidden_inputs = np.dot(self.wih, inputs) # calculate the signals emerging from hidden layer hidden_outputs = np.asarray(self.activation_function(hidden_inputs)) # calculate signals into final output layer final_inputs = np.dot(self.who, hidden_outputs) # calculate the signals emerging from final output layer final_outputs = np.asarray(self.activation_function(final_inputs)) # output layer error is the (target?actual) output_errors = np.asarray(targets-final_outputs) self.who += self.lr * np.dot((output_errors * final_outputs * (1.0-final_outputs)), np .transpose(hidden_outputs)) pass
Next, we will tackle the backward pass by calculated the new weight of the hidden layer, find the error slope between the input and hidden layers. The stage is slightly different than the previous one. The output of each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons. The error becomes and takes into consideration the changes of both output,
We can figure out the impact of the error after a change in the weight
We have now our updated weights.
The python code is straightforward. We append the new matrices required to compute the updated weight for the hidden layer.
def train(self, inputs_list, targets_list) : # convert inputs list to 2d array inputs = np.array(inputs_list, ndmin=2).T targets = np.array(targets_list, ndmin=2).T # calculate signals into hidden layer hidden_inputs = np.dot(self.wih, inputs) # calculate the signals emerging from hidden layer hidden_outputs = np.asarray(self.activation_function(hidden_inputs)) # calculate signals into final output layer final_inputs = np.dot(self.who, hidden_outputs) # calculate the signals emerging from final output layer final_outputs = np.asarray(self.activation_function(final_inputs)) # output layer error is the (target dot actual) output_errors = np.asarray(targets-final_outputs) # hidden layer error is the output_errors, split by weights, recombined at hidden nodes hidden_errors = np.asarray(np.dot(self.who.T, output_errors)) # update the weights for the links between the hidden and output layers self.who += self.lr * np.dot((output_errors * final_outputs * (1.0-final_outputs)), np.transpose(hidden_outputs)) # update the weights for the links between the input and hidden layers self.wih += self.lr * np.dot((hidden_errors * hidden_outputs * (1.0-hidden_outputs)), np.transpose(inputs)) #pass pass
Our model is ready. In the next article, we will see the code in action with the famous handwritten dataset.