# First Steps to Implement a Neural Network Model Using Theano and Lasagne

_{by Ghadir Eraisha,}

_{ Data Scientist @ Runtastic}

Neural network models have seen dramatic improvements over the last decade. First, due to technological advancements that have allowed for cheaper and more efficient computing power. Second, the performance of neural networks improves by the amount of data it’s fed. Thus, the surge in big data has been advantageous to neural networks-based models. We at Runtastic make use of these advancements and employ neural networks to empower our products with artificial intelligence.

As Data Scientists at Runtastic, we have tried some of the recently developed libraries to build and train neural networks. One of the best libraries we have used is Theano (written in Python), which provides flexibility in building and adjusting the models along with high performance. Moreover, many libraries are built on top of Theano, such as Lasagne. Lasagne makes it easier to build, adjust and train neural networks. Yet, the library doesn’t obscure the Theano symbolic variables and expressions, so you can still manipulate them to adapt the architecture and the learning algorithm to best suit your use case.

In this post, we will train a generic purpose multi-layer neural network to introduce you to Theano and Lasagne and highlight some handy features in those libraries.

**1- The Network Architecture**

We declare Theano variables for inputs and outputs of the network as symbolic variables with no values assigned to them yet. These variables only define the pipeline of computations. The concept is that Theano constructs a computational graph based on them so that it can optimize the computations and maintain high performance. In later steps, we will explain more about that.

input_var = T.tensor4("inputs") target_var = T.ivector("targets")

Then, we define the neural network architecture. Lasagne already implements the commonly used layer types. We basically declare a dictionary structure type which contains all the layers along with their specifications. The first layer takes the extracted features as input. Then, the layers are connected in series, where the output of one layer is the input of the next one. Finally, the last layer generates the output.

network = {} # Input layer: network["input_layer"] = lasagne.layers.InputLayer(shape=data_size, input_var=input_var) # Two fully-connected layer of 500 units, using the linear rectifier, # and initializing weights with Glorot's scheme: network["hidden1"] = lasagne.layers.DenseLayer( network["input_layer"], num_units=500, nonlinearity=lasagne.nonlinearities.rectify, W=lasagne.init.GlorotUniform()) network["hidden2"] = lasagne.layers.DenseLayer( network["hidden1"], num_units=500, nonlinearity=lasagne.nonlinearities.rectify) # Sigmoid output layer of n units: network["out"] = lasagne.layers.DenseLayer(network["hidden2"], num_units=n, nonlinearity=lasagne.nonlinearities.sigmoid)

The number of output units depends on the problem itself; for classification problems, for example, it is the number of classes. Also, the activation function of the output units depends on the problem. For binary-class problems, the sigmoid function is commonly used, while for multi-class problems, the softmax function is commonly used.

**2- Training Loss Function & Update Rule**

We define the loss expression which is a scalar objective we want to minimize. For instance, if we have a binary-class problem, then we minimize the cross-entropy loss. Most of the commonly used loss expressions are already implemented in Lasagne.

prediction = lasagne.layers.get_output(network["out"]) loss = lasagne.objectives.categorical_crossentropy(prediction, target_var) loss = loss.mean()

Then, we create the update rule for training which defines how to modify the network weights at each training step. We use Stochastic Gradient Descent with Nesterov momentum. At this point, Theano automatically calculates the partial derivatives of the loss expression we defined with respect to the weights, as so far we have declared the computations pipeline using symbolic expressions.

params = lasagne.layers.get_all_params(network["out"], trainable=True) updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)

**3- Testing Loss Function & Accuracy**

We define the loss expression for testing. In contrast to the training loss expression, we do a deterministic forward pass through the network.

test_prediction = lasagne.layers.get_output(network["out"], deterministic=True) test_loss = lasagne.objectives.categorical_crossentropy(test_prediction, target_var) test_loss = test_loss.mean()

Additionally, we define an expression which computes the accuracy of the model.

test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var), dtype=theano.config.floatX)

**4- Compiling the Training & Testing Functions**

This is where the actual computation is done, where we pass input values and receive outputs of the network. We compile a function that actually passes inputs to the network, performs a training step on a batch of examples and returns the corresponding training loss.

train_fn = theano.function([input_var, target_var], loss, updates=updates)

Then, we compile another function to compute the validation loss and accuracy.

val_fn = theano.function([input_var, target_var], [test_loss, test_acc])

**5- Iterative Training**

We train the Neural Network iteratively over mini-batches of the input training set.

epochs = 500 batch_size = 500 batches = n_examples / batch_size for epoch in range(epochs): for batch in range(batches): x_batch = x_train[batch * batch_size: (batch + 1) * batch_size] y_batch = y_train[batch * batch_size: (batch + 1) * batch_size] train_err += train_fn(x_batch, y_batch)

**6- Testing**

Finally, we calculate the error and accuracy of the testing set.

err, acc = val_fn(x_test, y_test)

***