MNIST – Deep Learning’s “Hello world!”

A widely used measure of how good a deep learning or machine learning algorithm is, is how it performs on the MNIST database

The MNIST database or simply MNIST, is a large collection of digital images of handwritten numbers from 0 to 9. Examples of these are shown below. Each image is 28 by 28 pixels in size. The complete training dataset contains 60,000 images with a corresponding test set of 10,000 images.


An image showing multiple examples of handwritten digits from zero to nine inclusive taken from the MNIST database
A selection of digitised, handwritten digits from the MNIST database

MNIST is widely used in machine learning research for reasons we will discuss later. The most common use for this dataset, is to measure how well computers can automatically read human handwriting. If a computer can read human writing well, then it can be useful in real-world applications. For example, automatically sorting mail or reading road signs for self-driving cars. Ideally, these computers should be as accurate if not more accurate, than humans.

A brief history of MNIST

MNIST was first introduced to the machine learning community in the year 1998 in a groundbreaking academic paper titled “Gradient-Based Learning Applied to Document Recognition”. The paper described a previously unseen type of machine learning algorithm called a convolutional neural network or CNN for short. They demonstrated how this algorithm outperformed all other methods of automatically determining what digit was in each image. In fact, the best performing version of their algorithm achieved 99.3% accuracy. That means out of 10,000 digits in the test set, their automatic method only incorrectly identified 70 of them.

(For more information on CNNs, check out this fantastic video by Brandon Rohrer. As well as this article for a more in-depth look.)

The method was very successful. So much so, that one of the authors of the paper, Yann LeCun, claimed in an interview that the system went on to read the amount of 10% to 20% of all financial cheques in the United States. Yann LeCun went on to eventually become Head of Artificial Intelligence at Facebook.


A picture of Yann LeCun, who is Facebook's Head of Artificial Intelligence
Yann LeCun

Why is MNIST so popular?

Researchers who study artificial intelligence and machine learning want to focus on the details. They do not want to spend their time having to clean, process and prepare data for use in their experiments. The MNIST dataset allows them to do just that. To quote from Yann LeCun’s webpage:

“It [MNIST] is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting”

What do we mean by MNIST is deep learning’s “Hello World!”?

A “Hello World!” computer program is a very common introduction to computer programming. It is usually the first program that  a person writes when learning a new programming language. The program simply prints the words “Hello World!” to the screen of the computer.

The screenshot below shows an example of such a program written in the Python 3.6 programming language, on Windows 10. On the first line, we see the code print(“Hello World!”). On the second line we see the result of that code, a printout of “Hello World!” to the screen.


The image shows a screenshot of a "Hello World!" program written in Python 3.6 being executed on Windows 10
A simple “Hello World!” program written in the Python 3.6 programming language

Because MNIST is popular for testing and analysing machine learning algorithms, we feel it has become the “Hello World!” of deep learning! In other words, if you have an algorithm you want to try out, MNIST could be a great place to start.

Measuring how good a deep learning algorithm is

Accuracy is not the only measure of the performance of a deep learning algorithm. How quickly an algorithm learns is also of  interest.

For example, imagine we have two algorithms for detecting if there is a cat in an image or not. Algorithm A and algorithm B. Both A and B have an accuracy of 98%. That is, loosely speaking, 98% of the time they correctly identify images with cats in them. 2% of the time they make a mistake.

However the difference between them, is that A took 10 days to “learn”, but B only 5 days. B trained twice as fast as A. In this case, we would say B is a better algorithm because it is able to learn faster.

In fact there are many more ways to assess an algorithm. Not only is there accuracy and how quickly it learns, we can look at other metrics such as AUROC score or F1 score. But that discussion is beyond the scope of this article.

What we want to make clear is that accuracy isn’t everything. Especially when deciding what algorithm to use for your task.

Do deep learning algorithms need to be perfect?

The short answer is No. For many applications your algorithm only has to be good enough.

Good enough can mean, good enough to boost a company’s profits or save human time.

A great example of this is in the academic paper mentioned earlier. On page 37 the authors describe an automated cheque reading system. The system was able to read cheques and figure out how much the amount was.

In order increase the profits of banks, the system had to be able to read at least 50% of the cheques and make errors at most 1% of the time. The remaining cheques, that the system was unable to read, could be sent to a human instead. The authors’ system actually performed better than this. But the most important thing to recognise, is that their system was not able to read all the cheques. Only most of them. It was not perfect but it was good enough to save the banks time and money.

Thanks for reading!

We hope you have enjoyed reading this article and found it informative. Please do comment below and let us know what you think.

Leave a Reply

Your email address will not be published. Required fields are marked *