Classifying breeds deep learning approach

7 min readFeb 15, 2022

Final project — Udacity Data Scientist

Project Definition

This project is a image classfication problem as part of the Udacity Data Scientist Nanodegree final project. It consists in creating an app that can classify an image of a dog in its respective breed. The final app would also understand that a human picture is not a dog, but also classify the human image in one of the dog breeds. It should give an error message if the input is not human or dog.

The problem

We are given 133 dog classes to classify the images into. the training set has also 133 classes, but with different number of images per class. Our goal is to get at least >1 % accuracy in a simple model created from scratch and at least 60% accuracy using transfer learning.

Metrics

There are many metrics commonly used to measure classification models. The metric used was accuracy because it was requested in the problem. Accuracy is a statistical measure of how well a binary classification test correctly identifies or exclude a condition.

Accuracy = (True positives + True negatives) / Total n samples

The proportion of correct predictions (both positive and negative) among the total cases examined. The problem with accuracy is that it does not give a clear picture of the falses positives and false negatives. For example, lets suposse we have a training dataset of 2 classes: class1 with 999 images and class 2 with 1 image . Our model can always choose class 1 and get 99.9% accuracy and be very bad in classification.

So, for inbalanced training sets, we need to account for the falses positive and negative as well. A better metric for this type of dataset would be for example the Precision-Recall Curves and AUC, which accounts for both true positive rate and false positive rate.

The Data

The dataset had 133 classes, a class for each breed in a total of 6680 images for train, 835 for validation and 836 for test. The number of image per class varied from 77 in the Alaskan malamute to 26 in the Xoloitzcuintli breed. The mean number of images was 50 per class with std of 11. The preprocessing made were to first assure that all images were 224x224x3 and then converting to a matrix of float32 [224,224,3] normalized by the highest value that a 8 bits pixel can have 255.

Classes of dog breeds x number of images in each class

Also, we can see that both train and validation datasets are not clear, with images focusing more in the human than in the dog and also images with more than one breed in the same image. The dataset was not cleaned for this project.

Strategies to solve the problem

Image classification is one of the many possible practical uses of machine learning and more specifically deep learning. To extract information from images different model architectures has been proposed, but the far most common are de Convolutional Neural Networks (CNN). Convolution are used in images to extract features, in the CNN, the ‘2d filter’ is trained in the training phase to extract the most useful information the rest of the model will use to reduce the objective function. In this way, in the end of the training, each convolutional layer will have decoded different features of the same image. We will describe how to use CNN to classify dog breeds.

2D convolution in action: The withe squares are the padding, necessary to have the same number of squares after the convolution. The blue squares are the image and the green are the resulted decoded image. (image from wikipedia)

The first idea was to train a model from scratch. Two CNN models were created and after training they were bad. The problem was not the models, but the size of the training images that was too small. The best solution was to use transfer learning to get the maximum of our few images.

Modeling

Create and train models

As mentioned before, two models were trained from scratch, one simple model, and a bigger model with residual.

Both models had low accuracy. The larger model was trained for 10 epochs and got an accuracy of 0.95%. Which is much better than random, but still far from ideal. The smaller model had an accuracy of 1.32%.

Transfer learning

Although any model can be trained from scratch, when few images are disponible, a very interesting to solve the problem is to use transfer learning. The idea behind it is to use a model that was already trained in a different dataset (but likely enough) so the information stored in the weights can be reused in your dataset. The technique is so common that the major deep learning libraries have already a collection of these trained models to be imported and used. For image classification (our case) both PyTorch and Keras have many trained models for download and use.

The graph below shows an update (14/02/2022) benchmark of image models in a scale of accuracy x year of CNN models trained and tested on the ImageNet bank.

Two model were chosen to classify the dogs: InceptionV3 and VGG16. The Inception V3 is a larger model with a better accuracy and VGG16 which is a small model wich can be trainned quite fast with a descent accuracy.

the best model was a mixed of both InceptioV3 and VGG16 with a layer of gaussian noise after the Inception and VGG input.

Our best result was using a learning rate scheduler. We first trained for

20 epochs using RMSprop as optimizer with a lr=0.0005 amd 16 epochs usign a start of lr=0.0002. Then a lr scheduler callback, a simple function to change learning rate during training.

Results

In both models two approaches were tested. the first was to train the model as it is using bootleneck features, and a Dense layer, the second approach was to try to implement a type of augumentation. Due to the fact that bootleneck features were used, the way found to get a better accuracy was adding a Gaussian noise layer. Wich improved the accuracy in 6% using .3 as parameter of the gaussian noise.

VGG16 original → 50.24 % accuracy
VGG +noise → 56.82 % accuracy

We also tested the noise for the InceptionV3, the best result was with .1.

InceptionV3 → 82.06% accuracy

To increase the accuracy another model was created. A sequential model that combines both InceptionV3 and VGG16 using concatenation.

x = layers.concatenate([inceptionV3_output, VGG16_output])

The mixed model was more accurated than each of them alone:

mixed → 86.24 % accuracy

Testing for different images:

When implementing a classification app, would be nice to have a model that will not be thrown an error when a random image is used. Not only that but we can use different models (one for each type of possible image) to detect and classify correctly. We tested the OpenCV library for human classification. The OpenCV provides many pre-trained face detectors. We use a face detector to detect if the images input were or not humans.

We used a haarcascade pretrained model which gave 100% accuracy in detecting if it is an human image or not:

Some examples of the model in action.

Conclusion and Improvements

Doing this project was fun and instructive experience. The result was a functional app that can make (almost always) right classification of dog breeds. The results are satisfatory, but they could be improved.

I would suggest getting more image from each class. Although transfer learning can give a good result with few images, many images would result in a better classification. I have not used augmentation, it could result in better accuracy by forcing the model to extract more meaninful features.

Another way to improve would be by optimizing the parameters automatically using a grid search or similar. By getting 86% accuracy In many improvements iteractions I considered my job done, but there are always new ways to improve. Lastly, use a better model (one cited in the figure2 above).

The entire code for this project can be found here!