Multilabel Classification with CNN

AKINTIBU TOSIN OPEYEMI
6 min readDec 18, 2020

#Using Convolutional Neural Network(CNN) — Keras

Image from dreamstime

With the availability of massive amount of data and computation power, Deep Learning Algorithms are performing better and better.

Deep Learning Algorithms are able to achieve state-of-the-art performance in Computer Vision tasks outperforming human performance in many cases.

In this article, I will try to give you a broad understanding of solving any Image Classification problem. We will address a multi classification problem using Convolutional Neural Network(CNN) using Keras Framework with cups, plates and spoons dataset which I collected locally .

All the code will be shared on the Github repository.

Why CNN for Computer Vision?

Simple Neural Network

For the above network, let’s suppose the input shape of the image is (64, 64, 3) and the second layer has 1000 neurons. Then, the dimension of weights corresponding to layer 1 will be W[1] = (1000, 64*64*3) = (1000, 12288). This gives the number of parameters for layer 1 to be 12288000 (~ 10 Million).

With the increase in the resolution of the Image, the number of parameters in case of Simple Neural Networks becomes huge. With that many parameters, it’s difficult to get enough data to prevent a simple neural network from overfitting. Moreover, the computational requirements and the memory requirements to train such a model is just a bit infeasible. But for computer vision applications, you don’t want to be stuck using only tiny images. You want to use large images.

Convolutional Neural Network (CNN) on the other hand implement what is called a convolution operation, which is one of the fundamental building blocks of a Convolutional Neural Network.

Convolutional Neural Network (CNN)

Convolutional Neural Network(CNN)

With the help of convolutional operations, CNN are able to learn low-level features (like edges) in initial layers, then somewhat high-level features (like the orientation of nose, ears etc) in the deeper layers and then finally the complete face of cat or dog in the last layers.

CNN’s are able to achieve this with:

  • a relatively small number of parameters (prevents the model from overfitting as well as help in computation requirements).
  • parameter sharing: A feature detector (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
  • the sparsity of connection: In each layer, each output value depends only on a small number of inputs.

The above ideas are taken from the Deep Learning Specialization course by Dr. Andrew Ng on Coursera. I would highly suggest you guys take the course if you are passionate about deep learning.

Introduction

In this article, we will be solving a multi classification “cups, spoons and plates” using Convolutional Neural Network(CNN). We will be using Keras Framework. Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, or Theano. Note that I will be running Keras on TensorFlow for this particular project.

Data Gathering

Data gathering essentially deals with the collection of data needed to solve a given problem. In this project, the data (images of cups, plates and spoons) were gathered using a mobile phone.

Whatever your algorithm learns is as good as the data it feeds on and will determine how well it’s able to make inferences from that data.

The total number of images gathered is 47. The images contain a single instance of cup, plate or spoon.

Preparation and Pre-processing of Dataset

The dataset for this challenge can be found in the here. The training set contains 32 images of cups, plates and spoons and testing set contains 25 images of cups, plates and spoons.

First, we import some libraries.

And then we label the data; data labelling is a technique used generally in data science to allow an algorithm to map input data to the expected output. Check how we do this in the codes available in the Github repository.

We have to write a load_data function that load the images and the labels from the folder.

Let’s visualize our dataset to get the proportion of each image instance

And then we get this output,

Output

Now, let’s see some random images in our dataset.

display_examples(class_names, train_images, train_labels)

We then normalize our training and validation set.

Implementation of a Convolutional Neural Network(CNN) model using Keras.

Implementing a Deep Convolutional Neural Network (CNN) using Keras is super easy and fun. We define the model as the instance of Sequential() and then just define the layers(Conv2D, MaxPooling2D, Flatten, Dense,Relu). Loss function used — sparse_categorical_crosssentropy.

Optimizer used- Adam.

Summary of our Deep CNN Model Architecture:

Training of CNN using Training Data

Let’s train our model on our Training Data and test the progress of our Validation Data.

We define the epochs(number of times we are going to scan our whole training data) to be 15. Batch size is chosen to be 96.

Training Loss, Training Accuracy, Validation Loss, Validation Accuracy after each epoch:

Our Training Accuracy comes out to be 92% and our Validation Accuracy comes out to be 71.43%. I will highly recommend you train your model using GPU. The code in the GitHub link is written on Kaggle and ran on GPU runtime environment.

While trying to train the model, we realized that our Validation Loss was increasing which depicts overfitting. To overcome overfitting in image classification, you can either use earling stopping, data augmentation, regularization or use Dropouts. In this project, we will be using Data Augmentation.

Data Augmentation

Data Augmentation is a technique used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

Augmentation techniques with Keras ImageDataGenerator class include;

  • Random Rotations
  • Random Shifts
  • Random Flips
  • Random Brightness
  • Random zoom

Let’s come up with the augmentations we would want to apply to the images.

We initially augment the data with horizontal flip, randomly rotate the image and zoom augmentation and then visualize to see the changes that happened with the images. We then specify all our augmentation techniques.

We specify train_dir and test_dir which stores the path information of all the images from the root directory.

The flow_from_directory() method allows you to read the images directly from the directory and augment them while the neural network model is learning on the training data.

The method expects that images belonging to different classes are present in different folders but are inside the same parent folder.

The following are few important parameters of this method:

1. directory: this is the path to the parent folder which contains the subfolder for the different class images

2. target_size: size of the input image

3. class_mode: set to binary is for 1-D binary levels but we made use of categorical because it’s a 2-D one-hot-encoded labels.

The next step is to build CNN model and compare the performance of the model both, with and without augmentation.

Let’s create the architecture for the CNN model.

Now that we have created the architecture for our model, we can compile it and start training it.

After 15 epochs we get the following loss and accuracy for the model on the augmented data.

Training and Validation Accuracy
Training and Validation Loss

As you can notice here, the training and validation loss are both decreasing here and the training and validation accuracy is increasing together. That is the power of data augmentation.

Gratitude

This depicts my final project as a mentee of She Code Africa in the Data Science Track. Sincere appreciation goes to my ever supporting mentor Oladipupo Joseph for the unrelenting effort and encouragement in making this a success.

--

--