THE NON-SCHOLARLY WRITE-UP : LeNet architecture applied to the MNIST dataset: 99% accuracy.


For about a week or so , I have been working on the ‘Digit Recognizer’ competition over at Kaggle. I started out with my favourite go-to algorithm: Random Forest, and eventually moved on to other implementations and variations including

  • KNN
  • KNN with PCA
  • XGBoost
  • deeplearning with H2O
  • GBM with H2O
  • Ensembling

And then I plateaued at 97.8% .

A quick google search not too different from ‘ improve score + Digit Recognizer +MNIST’, threw up a bunch of pages all of which seemed to talk about Neural Networks.I’m like huh? Isn’t that biology?

Sure is. Who’da thunk it!

Anyway ,I spent  considerable time pouring over a few AMAZING bookmark-able resources and implemented my first ConvNet (I feel so accomplished!).

The implementation in question is called the LeNet. One of the best convolutional networks is the LeNet architecture that is used to read zip codes, digits, etc.


The model consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons.

Step 1: Load libraries


Step 2: Read the datasets

These are available from the Kaggle ‘Digit Recognizer’ competition page here.

Here every image is represented as a single row. The pixel range for each image lies between 0 and 255.

trainorig <-read.csv("C:/Users/Amita/Downloads/train.csv",header=T,sep=",")
testorig <-  read.csv("C:/Users/Amita/Downloads/test.csv",header=T,sep=",")

Step 3: Convert the training and testing datasets into matrices


Step 4: Extract the labels

train.y<-train[,1] # labels

Step 5: Scale the data and transpose  the matrices since mxnet seems to prefer observations in columns instead of rows.


The transposed matrix contains data in the form npixel x nexample.

Step 6: Convert the matrices into arrays for lenet

train.array <- train.x
 dim(train.array) <- c(28, 28, 1, ncol(train.x))
 test.array <- test
 dim(test.array) <- c(28, 28, 1, ncol(test))

Each input x is a 28x28x1 array representing one image, where the first two numbers represent the width and height in pixels, the third number is the number of channels (which is 1 for grayscale images, 3 for RGB images).

Step 7: Configure the structure of the network

# Convolutional NN
 data <- mx.symbol.Variable('data')
 # first conv
 conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20)
 relu1 <- mx.symbol.Activation(data=conv1, act_type="relu")
 pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max",
 kernel=c(2,2), stride=c(2,2))
 # second conv
 conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50)
 relu2 <- mx.symbol.Activation(data=conv2, act_type="relu")
 pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max",
 kernel=c(2,2), stride=c(2,2))
 # first fullc
 flatten <- mx.symbol.Flatten(data=pool2)
 fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500)
 relu3 <- mx.symbol.Activation(data=fc1, act_type="relu")
 # second fullc
 fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10)
 # loss
 lenet <- mx.symbol.SoftmaxOutput(data=fc2)

Step 8: Train the model

 model <- mx.model.FeedForward.create(lenet, X=train.array, y=train.y,
 ctx=devices, num.round=20, array.batch.size=100,
 learning.rate=0.05, momentum=0.9, wd=0.00001,

Step 9: predict on the test dataset and calculate accuracy

preds <- predict(model, test.array) 
pred.label <- max.col(t(preds)) - 1

Step 10: Predict on the final test dataset and submit to Kaggle

# predict on the kaggle dataset 
 testorig <- as.matrix(testorig)
 testorig.array <- testorig
 dim(testorig.array) <- c(28, 28, 1, ncol(testorig))

 predictions <- data.frame(ImageId=1:nrow(testo), Label=predlabel)
write.csv(predictions, "CNN.csv",row.names=FALSE)

and *ba-dum-tsss* !!! a 0.99086 !

If anybody has any ideas on how to improve this score , please share! TIA!