First Steps to Implement a Neural Network Model Using Theano and Lasagne

by Ghadir Eraisha,
Data Scientist @ Runtastic

Neural network models have seen dramatic improvements over the last decade. First, due to technological advancements that have allowed for cheaper and more efficient computing power. Second, the performance of neural networks improves by the amount of data it’s fed. Thus, the surge in big data has been advantageous to neural networks-based models. We at Runtastic make use of these advancements and employ neural networks to empower our products with artificial intelligence.

As Data Scientists at Runtastic, we have tried some of the recently developed libraries to build and train neural networks. One of the best libraries we have used is Theano (written in Python), which provides flexibility in building and adjusting the models along with high performance. Moreover, many libraries are built on top of Theano, such as Lasagne. Lasagne makes it easier to build, adjust and train neural networks. Yet, the library doesn’t obscure the Theano symbolic variables and expressions, so you can still manipulate them to adapt the architecture and the learning algorithm to best suit your use case.

In this post, we will train a generic purpose multi-layer neural network to introduce you to Theano and Lasagne and highlight some handy features in those libraries.

1- The Network Architecture

We declare Theano variables for inputs and outputs of the network as symbolic variables with no values assigned to them yet. These variables only define the pipeline of computations. The concept is that Theano constructs a computational graph based on them so that it can optimize the computations and maintain high performance. In later steps, we will explain more about that.

input_var = T.tensor4("inputs")
target_var = T.ivector("targets")

Then, we define the neural network architecture. Lasagne already implements the commonly used layer types. We basically declare a dictionary structure type which contains all the layers along with their specifications. The first layer takes the extracted features as input. Then, the layers are connected in series, where the output of one layer is the input of the next one. Finally, the last layer generates the output.

network = {}
# Input layer:
network["input_layer"] = lasagne.layers.InputLayer(shape=data_size, input_var=input_var)
# Two fully-connected layer of 500 units, using the linear rectifier,
# and initializing weights with Glorot's scheme:
network["hidden1"] = lasagne.layers.DenseLayer(
   network["input_layer"], num_units=500,
   nonlinearity=lasagne.nonlinearities.rectify,
   W=lasagne.init.GlorotUniform())
network["hidden2"] = lasagne.layers.DenseLayer(
   network["hidden1"], num_units=500,
   nonlinearity=lasagne.nonlinearities.rectify)
# Sigmoid output layer of n units:
network["out"] = lasagne.layers.DenseLayer(network["hidden2"], num_units=n,
                                          nonlinearity=lasagne.nonlinearities.sigmoid)

The number of output units depends on the problem itself; for classification problems, for example, it is the number of classes. Also, the activation function of the output units depends on the problem. For binary-class problems, the sigmoid function is commonly used, while for multi-class problems, the softmax function is commonly used.

2- Training Loss Function & Update Rule

We define the loss expression which is a scalar objective we want to minimize. For instance, if we have a binary-class problem, then we minimize the cross-entropy loss. Most of the commonly used loss expressions are already implemented in Lasagne.

prediction = lasagne.layers.get_output(network["out"])
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()

Then, we create the update rule for training which defines how to modify the network weights at each training step. We use Stochastic Gradient Descent with Nesterov momentum. At this point, Theano automatically calculates the partial derivatives of the loss expression we defined with respect to the weights, as so far we have declared the computations pipeline using symbolic expressions.

params = lasagne.layers.get_all_params(network["out"], trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)

3- Testing Loss Function & Accuracy

We define the loss expression for testing. In contrast to the training loss expression, we do a deterministic forward pass through the network.

test_prediction = lasagne.layers.get_output(network["out"], deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction, target_var)
test_loss = test_loss.mean()

Additionally, we define an expression which computes the accuracy of the model.

test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
                 dtype=theano.config.floatX)

4- Compiling the Training & Testing Functions

This is where the actual computation is done, where we pass input values and receive outputs of the network. We compile a function that actually passes inputs to the network, performs a training step on a batch of examples and returns the corresponding training loss.

train_fn = theano.function([input_var, target_var], loss, updates=updates)

Then, we compile another function to compute the validation loss and accuracy.

val_fn = theano.function([input_var, target_var], [test_loss, test_acc])

5- Iterative Training

We train the Neural Network iteratively over mini-batches of the input training set.

epochs = 500
batch_size = 500
batches = n_examples / batch_size
for epoch in range(epochs):
   for batch in range(batches):
       x_batch = x_train[batch * batch_size: (batch + 1) * batch_size]
       y_batch = y_train[batch * batch_size: (batch + 1) * batch_size]
       train_err += train_fn(x_batch, y_batch)

6- Testing

Finally, we calculate the error and accuracy of the testing set.

err, acc = val_fn(x_test, y_test)

***

RATE THIS ARTICLE NOW

Runtastic Tech Team We are made up of all the tech departments at Runtastic like iOS, Android, Web, Infrastructure, DataEngineering, etc. Weโ€™re eager to tell you how we work and what we have learned along the way. View all posts by Runtastic Tech Team »

Leave a Reply