In this section, we’re going to continue supervised machine learning by looking into one of the most interesting algorithms I’ve come across in my software endeavours – artificial neural networks. Although the algorithm is over 50 years old, it’s only until recently that technology has caught up. Faster processors and parallel computation are big driving factors, as well as many libraries such as TensorFlow which make neural network calculations easier to work with.  Moreover, Big Data has begun to provide the massive datasets required to train ANN’s to quite high accuracies.

If you’re unfamiliar with TensorFlow, I’ll provide here is a brief introduction. At a high level, it is a library that simplifies matrix multiplication, as well as contains tons of helper functions for training and much more. A tensor is just an array or multidimensional array (set of values) that TensorFlow can digest. In essence, we use TensorFlow syntax to model our computational graph, and then dispatch it into a session, which then responds with an output.

Any who, without digressing too much, let’s jump right in.

Introduction to Neural Networks

A neural network is a simplified model of how the human brain works. It is a network of simple neuron’s that are modelled vary similarly to the diagrams below.:

 

neuron

 

The inputs are multiplied by the weights, which are then all summed together. Without going into too much detail about the bias, here is a simple explanation. It exists so that if all of the inputs are zero, we still have a value to work with that doesn’t just multiply to zero. Therefore, we add the bias to the summed (inputs * weights), and then proceed onward.

The stepper function (also known as the activation function) is a function that, depending on the value of x, outputs a one or a zero. As you can see, the graph jumps from zero to one at an inflection point of x. At this point, it’s worth noting that our neural network won’t use that exact graph, but a smooth sigmoid graph. This is shown below and is known conveniently as – a sigmoid function. Instead of classifying of a value between 0 and 1, it can generate an output over a continuous function and produces a wide range of outputs.

S(t)={\frac {1}{1+e^{-t}}}.

That’s pretty much all there is to this simple neural network. Every neuron is treated the same, and every weight is unique to the connection from neuron to neuron. It is these different weights that allows a neural network to change its behaviour based on inputs. These collections of weights, along with the network itself, give us a model to pass data through. Of course, the weights need to be trained. This is where the supervised learning part comes into play.

We subject the network to training data, and then tweak the weights accordingly. Since our network flows from layer to layer in the forward direction only, we call it a feed forward neural network. Since it also contains more than one hidden layer, we call it a deep neural network. In order to tweak the weights, we use a process called back propagation, since we adjust the weights as we go backwards after each pass.

For our example, we’re going to train an ANN to read hand written digits between 0 and 9. For this, we’re going to use the MNIST dataset. The MNIST dataset contains 60,000 training images  (digits between 0 and 9) that are mapped to correct outputs, as well as 10,000 testing images to let us know how well we did after training.

The goal is to convert this:

To: 5 0 4 1. As you can see, this set has quite the soul (look again).

We also want our model to work for patterns it hasn’t even see before. This is obviously easy for a human, however it’s not so easy for a machine. Well, its difficult but its not THAT difficult, as we’ll see below. Enter the artificial neural network.

The Code

The first thing we want to do is import TensorFlow, as well as our dataset.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

Great, now the next step is to grab our dataset. MNIST is quite large, which makes it unreasonable to load it all into RAM in one go (although, we probably could). You’ll see as you progress further into machine learning, that Big Data is quite the driving factor. The training data that you might use is so large, that you simply can’t load it all into memory – therefore you load it in components. In fact, many times the data itself is spread across multiple machines due to its enormous size. That’s what we’re going to do. We define a batch size of about 100, so that we grab 100 images at a time.

I’m going to talk about the ‘one_hot’ parameter afterwards, so don’t worry.

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
batch_size = 100

After this, we have to define how many neurons we want in each hidden layer. For this example, we’re going to use 500 in each – although this could vary depending on your use case. As you can see, 500x500x500 is quite a lot. In fact, its 120 million connections. These things can grow quite quickly, so it helps to have a decent machine when it comes to computing your model.

n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500

Since our output numbers vary from 0 to 9, we have 10 potential classifications. Now I need to explain what the one_hot parameter means. As you can see here, we’re not outputting a 2, but an array of 10 digits. Every slot in the array represents a number on the scale from 0 to 9. You can see what 2 looks like in the comment below. The tensor length is how long our 1D array will be. The x and y values are just placeholders for now.

n_classes = 10 # Example: [0,0,1,0,0,0,0,0,0,0] == 2

# images are 28*28, which we squash into a
# one dimensional array of length 784
tensor_length = 28*28

# height x width
x = tf.placeholder('float', [None,tensor_length])
y = tf.placeholder('float')

Okay, now comes the neural network model itself. Bare with me, I’ll explain it as best as I can.

def neural_network_model(data):


	# biases make sure that inputs of zero still
	# produce a non-zero output

	hidden_layer_1 = {'weights':tf.Variable(tf.random_normal([tensor_length, n_nodes_hl1])),
					  'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}

	hidden_layer_2 = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
					  'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}

	hidden_layer_3 = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
					  'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}

	output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
					'biases':tf.Variable(tf.random_normal([n_classes]))}

	# (input_data * weights) + biases

	layer_1 = tf.add(tf.matmul(data,hidden_layer_1['weights']),hidden_layer_1['biases'])
	layer_1 = tf.nn.relu(layer_1)

	layer_2 = tf.add(tf.matmul(layer_1,hidden_layer_2['weights']),hidden_layer_2['biases'])
	layer_2 = tf.nn.relu(layer_2)

	layer_3 = tf.add(tf.matmul(layer_2,hidden_layer_3['weights']),hidden_layer_3['biases'])
	layer_3 = tf.nn.relu(layer_3)

	output = tf.matmul(layer_3, output_layer['weights']) + output_layer['biases']

	return output

We first model our hidden layers and output layers, which is what you see at the start of the function. Each layer is a dictionary of weights and biases, in the form:

{'weights': tensorObject, 'biases': tensorObject}

After that’s all done, we calculate layer_1, layer_2, layer_3 and the output layer, which is what we return at the end. We multiply our matrices, and then add them to our biases. After that, we transform the results into rectified linear units. The tf.nn.relu() function performs a threshold operation, where any input value less than zero is set to zero.

The next step is to train our network:

def train_neural_network(x):

	prediction = neural_network_model(x)
	# error between the predicted value and the real value
	cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction, y) )
	# lowers the cost function
	optimizer = tf.train.AdamOptimizer().minimize(cost)

	# number of cycles through the network
	num_of_epochs = 15

	with tf.Session() as sess:
		sess.run(tf.global_variables_initializer())

		for epoch in range(num_of_epochs):
			epoch_loss = 0
			for _ in range(int(mnist.train.num_examples/batch_size)):
				epoch_x,epoch_y = mnist.train.next_batch(batch_size)
				_,c = sess.run([optimizer,cost], feed_dict={x:epoch_x, y:epoch_y})
				epoch_loss += c
			print('Epoch', epoch, 'completed out of', num_of_epochs, 'loss', epoch_loss)
		correct = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
		accuracy = tf.reduce_mean(tf.cast(correct,'float'))
		print('Accuracy',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))

The AdamOptimizer is simply a class that implements the Adam algorithm, which attempts to lessen our cost value. An epoch is a full cycle. That includes the forward pass through the network, as well as the back-propagation to adjust the weights. That’s it, actually. Then we just look at the correctness, as well as the accuracy to figure out how we did.

You might be wondering, where is the back-propagation and the adjustment of the weights? Well, it turns out TensorFlow does that for you – which is awesome because it can be a huge pain in the butt.

Lets call train_neural_network(x) and see what our ANN outputs:

Epoch 0 completed out of 15 loss 1827994.0788
Epoch 1 completed out of 15 loss 397080.319996
Epoch 2 completed out of 15 loss 220061.084934
Epoch 3 completed out of 15 loss 132932.756693
Epoch 4 completed out of 15 loss 82708.8744059
Epoch 5 completed out of 15 loss 51131.8805857
Epoch 6 completed out of 15 loss 33124.0306211
Epoch 7 completed out of 15 loss 22423.5440493
Epoch 8 completed out of 15 loss 20369.9771181
Epoch 9 completed out of 15 loss 18138.3133205
Epoch 10 completed out of 15 loss 18065.5308844
Epoch 11 completed out of 15 loss 13574.5752766
Epoch 12 completed out of 15 loss 13169.764611
Epoch 13 completed out of 15 loss 15897.1297959
Epoch 14 completed out of 15 loss 11438.3036201

Accuracy 0.9548

The accuracy means that 95.48% of the time, our network was correct in its prediction of what number passed through it. For 15 passes and about 100 lines of code, I find that quite impressive. I mean, my laptop just learned how to read handwritten digits. How incredible.

Closing

I know this might have been pretty hard to follow for the uninitiated in ML, and I’m sorry for that. Some of these concepts are quite hard to simplify and if I did, this blog post would have taken me forever to write. If you have any questions, drop them down below and I’ll do my best.

Cheers and happy coding.

Advertisements