For about a week or so , I have been working on the ‘Digit Recognizer’ competition over at Kaggle. I started out with my favourite go-to algorithm: Random Forest, and eventually moved on to other implementations and variations including

- KNN
- KNN with PCA
- XGBoost
- deeplearning with H2O
- GBM with H2O
- Ensembling

And then I plateaued at **97.8**% .

A quick google search not too different from ‘ improve score + Digit Recognizer +MNIST’, threw up a bunch of pages all of which seemed to talk about Neural Networks.I’m like huh? Isn’t that biology?

Sure is. Who’da thunk it!

Anyway ,I spent considerable time pouring over a few AMAZING bookmark-able resources and implemented my first ConvNet (I feel so accomplished!).

**The implementation in question is called the LeNet**. One of the best convolutional networks is the LeNet architecture that is used to read zip codes, digits, etc.

The model consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons.

### Step 1: Load libraries

install.packages("drat") require(drat) drat::addRepo("dmlc") install.packages("mxnet") require(mxnet)

### Step 2: Read the datasets

These are available from the Kaggle ‘Digit Recognizer’ competition page here.

Here every image is represented as a single row. The pixel range for each image lies between 0 and 255.

trainorig <-read.csv("C:/Users/Amita/Downloads/train.csv",header=T,sep=",") testorig <- read.csv("C:/Users/Amita/Downloads/test.csv",header=T,sep=",")

### Step 3: Convert the training and testing datasets into matrices

train<-data.matrix(train) test<-data.matrix(test)

### Step 4: Extract the labels

train.x<-train[,-1] train.y<-train[,1] # labels test<-test[,-1]

### Step 5: Scale the data and transpose the matrices since mxnet seems to prefer observations in columns instead of rows.

train.x<-t(train.x/255) test<-t(test/255)

The transposed matrix contains data in the form npixel x nexample.

### Step 6: Convert the matrices into arrays for lenet

train.array <- train.x dim(train.array) <- c(28, 28, 1, ncol(train.x)) test.array <- test dim(test.array) <- c(28, 28, 1, ncol(test))

Each input x is a 28x28x1 array representing one image, where the first two numbers represent the width and height in pixels, the third number is the number of channels (which is 1 for grayscale images, 3 for RGB images).

### Step 7: Configure the structure of the network

# Convolutional NN data <- mx.symbol.Variable('data') devices<-mx.cpu() # first conv conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=20) relu1 <- mx.symbol.Activation(data=conv1, act_type="relu") pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max", kernel=c(2,2), stride=c(2,2)) # second conv conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=50) relu2 <- mx.symbol.Activation(data=conv2, act_type="relu") pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max", kernel=c(2,2), stride=c(2,2)) # first fullc flatten <- mx.symbol.Flatten(data=pool2) fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=500) relu3 <- mx.symbol.Activation(data=fc1, act_type="relu") # second fullc fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10) # loss lenet <- mx.symbol.SoftmaxOutput(data=fc2)

### Step 8: Train the model

mx.set.seed(0) model <- mx.model.FeedForward.create(lenet, X=train.array, y=train.y, ctx=devices, num.round=20, array.batch.size=100, learning.rate=0.05, momentum=0.9, wd=0.00001, eval.metric=mx.metric.accuracy, epoch.end.callback=mx.callback.log.train.metric(100))

### Step 9: predict on the test dataset and calculate accuracy

preds <- predict(model, test.array) pred.label <- max.col(t(preds)) - 1 sum(diag(table(test_org[,1],pred.label)))/8400

### Step 10: Predict on the final test dataset and submit to Kaggle

# predict on the kaggle dataset testorig <- as.matrix(testorig) testorig<-t(testorig/255) testorig.array <- testorig dim(testorig.array) <- c(28, 28, 1, ncol(testorig)) predtest<-predict(model,testorig.array) predlabel<-max.col(t(predtest))-1 predictions <- data.frame(ImageId=1:nrow(testo), Label=predlabel) write.csv(predictions, "CNN.csv",row.names=FALSE)

and *ba-dum-tsss* !!! a **0.99086 !**

If anybody has any ideas on how to improve this score , please share! TIA!

References:

- http://cs231n.github.io/
- http://mxnet.io/
- http://josephpcohen.com/w/visualizing-cnn-architectures-side-by-side-with-mxnet/
- http://dmlc.ml/rstats/2015/11/03/training-deep-net-with-R.html