Welcome to Machinfy Academy

Handwritten Digit Recognition (MNIST) Using PyTorch

blog image

We Can Make computer Learn to recognize Handwritten digit Using Deep learning. Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning.

In this article, We will develop a handwritten digit classifier from scratch. We will be using PyTorch.


The Dataset we will use is the MNIST dataset, The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. The data set is originally available on Yann Lecun’s website.

Step 1: Import Necessary Packages

#Scientific computing 
import numpy as np

#Pytorch packages
import torch
from torch import nn
import torch.optim as optim
import torchvision

import matplotlib.pyplot as plt

import time
import copy

Step 2: Download The Dataset & Preprocessing

Before downloading the data, let us define transformations that will be applied to images before feeding it into the pipeline.

transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),

transforms.ToTensor() : Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.Tensor (C x H x W) in the range [0.0, 1.0].

transforms.Normalize((0.5,), (0.5,)) : Normalize a tensor image with mean and standard deviation. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output = (input - mean) / st.

Downloading the dataset then applying the transformations

trainset = datasets.MNIST('data/', download=True, train=True, transform=transform)
valset = datasets.MNIST('data/', download=True, train=False, transform=transform)

Shuffling and provides an iterable over the given dataset.

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=64, shuffle=True)

Step 3: Building The Neural Network

PyTorch’s torch.nn module allows us to build the network very simply. It is easy to understand as well. Look at the code below.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(3, stride=2),
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(3, stride=2),
        self.fcs = nn.Sequential(
            nn.Linear(2304, 1152),
            nn.Linear(1152, 576),
            nn.Linear(576, 10)

    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = x.reshape(x.shape[0], -1)
        x = self.fcs(x)
        return x

model = Net()

Adam as optimiaztion function.

Softmax(CrossEntropyLoss) as loss function.

optimizer = optim.Adam(params=model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

Step 3: Training and Evaluating

train_model function available here.

def train_model(model, criterion, optimizer, scheduler, dataset_sizes, dataloaders, num_epochs=7 ):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())


    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    return model

model_ft = train_model(model, criterion, optimizer, exp_lr_scheduler,dataset_sizes,dataloaders,num_epochs=14)

Lets see our results our model accuracy > 99% .

Epoch 0/6
train Loss: 0.1817 Acc: 0.9446
val Loss: 0.0368 Acc: 0.9890

Epoch 1/6
train Loss: 0.0724 Acc: 0.9808
val Loss: 0.0290 Acc: 0.9914

Epoch 2/6
train Loss: 0.0589 Acc: 0.9841
val Loss: 0.0362 Acc: 0.9909

Epoch 3/6
train Loss: 0.0490 Acc: 0.9873
val Loss: 0.0281 Acc: 0.9925

Epoch 4/6
train Loss: 0.0382 Acc: 0.9897
val Loss: 0.0328 Acc: 0.9907

Epoch 5/6
train Loss: 0.0355 Acc: 0.9907
val Loss: 0.0248 Acc: 0.9932

Epoch 6/6
train Loss: 0.0329 Acc: 0.9917
val Loss: 0.0228 Acc: 0.9942

Training complete in 2m 53s
Best val Acc: 0.994200

Step 4: Visualizing the model predictions

This function for visualize single image with title.

def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().squeeze()
    plt.imshow(inp, cmap='gray_r')
    if title is not None:

This function predict values of some images from validation data.

def visualize_model(model, num_images=6):
    was_training = model.training
    images_so_far = 0

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.set_title('predicted: {}'.format(class_names[preds[j]]))

                if images_so_far == num_images:

Step 5: Saving The Model

Now that we are done with everything, we do not want to lose the trained model. We don’t want to train it every time we use it. For this purpose, we will be saving the model. When we need it in the future, we can load it and use it directly without further training.

torch.save(model, './my_mnist_model.pt') 

You can find Entire code here.


[1] MNIST.
[2] Pytorch Docs.
[3] Adam.
[4] Entire Code on GitHub.

Leave a Reply

Your email address will not be published.