Can you trust a Keras model to distinguish African elephant from Asian elephant?



I cannot lie, but I just learned to distinguish an African elephant from an Asian elephant not long ago.

It is said that you can tell where an elephant comes from by looking at the size of his ears. African ears are like a map of Africa and Asian ears smaller like the shape of India. African ears are much bigger and reach up and over the neck, which does not occur in Asian elephants.

 -- EleaidCharity

On the other hand, the state of the art ImageNet classification model can detect 1000 classes of objects at an accuracy of 82.7% including those two types of elephants of course. ImageNet models are trained over 14 millions of images and can figure out the differences between objects.


Have you wondered where does the model focusing on when looking at images, or shall we ask should we trust the model instead?

There are two approaches we can take to solve the puzzle.

The hard way. Crack open the state of the art ImageNet model by studying the paper, figuring out the math, implementing the model and hopefully, in the end, understand how it works.


The easy way. Become model agnostic, and we treat the model as a black box. We have control of the input image, so we tweak it.  We change or hide some parts of the image that make sense to us. Then we feed the tweaked image to the model and see what it think of it.

The second approach is what we will be experimenting with ,and it has been made easy by this wonderful Python library - LIME, short for Local Interpretable Model-Agnostic Explanations.

Install it with pip as usual,

pip install lime

Let's jump right in.

Choose the model you want to develop some trust with

There are many Keras models for image classification with weights pre-trained on ImageNet. You can pick one here at Available models.

I am going to try my luck with InceptionV3. It might take a while to download the pre-trained weight for the first time.

from keras.applications import inception_v3 as inc_net
inet_model = inc_net.InceptionV3()

Find the top 5 predictions with an input image

We choose the photo with two elephants walking side by side, it's a great example to test our model with.

The code pre-processes the image for the model, and the model makes the prediction.

images = transform_img_fn([os.path.join('data','asian-african-elephants.jpg')])
preds = inet_model.predict(images)
for x in decode_predictions(preds)[0]:

The output is not so surprising, the Asian a.k.a the India elephant is standing in front, taking quite a lot of space of the image, no wonder it gets the highest score.

('n02504013', 'Indian_elephant', 0.9683744)
('n02504458', 'African_elephant', 0.01700191)
('n01871265', 'tusker', 0.003533815)
('n06359193', 'web_site', 0.0007669711)
('n01694178', 'African_chameleon', 0.00036488983)

Explain to me what you looking at

The model is making the right prediction, now let's ask it to give us an explanation.

We do this by first creating a LIME explainer, it only asks for our test image and model.predict function.

import lime
from lime import lime_image
explainer = lime_image.LimeImageExplainer()
explanation = explainer.explain_instance(images[0], inet_model.predict, 

Let's see what the model is looking at to predict the Indian elephant. The model.predict function will output probabilities for each class indexes ranging from 0~999.

And the LIME explainer need to know which class by class index we want an explanation.

I wrote a simple function to make it easy

Indian_elephant = get_class_index("Indian_elephant")

It turns out the class index of "Indian_elephant" equals to 385. Let's ask the explainer to show the magic where the model is forcing on.

from skimage.segmentation import mark_boundaries
temp, mask = explanation.get_image_and_mask(Indian_elephant, positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))


This is interesting, the model is also paying attention to the small ear of the Indian elephant.

What about the African elephant?

African_elephant = get_class_index("African_elephant")
temp, mask = explanation.get_image_and_mask(African_elephant, positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))


Cool, the model is also looking at the big ear of the African elephant when predicting it. It is also looking at part of the text "AFRICAN ELEPHANT". Could it be a coincidence or the model is smart enough to figure out clue by reading annotations on the image?

And Finally, let's take a look at what are the 'pros and cons' when the model is predicting an Indian elephant.

temp, mask = explanation.get_image_and_mask(Indian_elephant, positive_only=False, num_features=10, hide_rest=False)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))

(pros in green, cons in red)


It looks like the model is focusing on what we do as well.

Summary and Further reading

So far, you might be still agnostic about how the model works, but at least developed some trust towards it. That might not be a terrible thing since now you have one more tool to help you distinguish a good model from a poor one. An example bad model might be focusing on the non-sense background when predicting an object.

I am bearly scratching the surface of what LIME can do, I encourage you to explore on other applications like a text model where LIME will tell you what part of the text the model is focusing on when making a decision.

Now go ahead, re-exam some deep learning models before they betray you.

LIME GitHub repository

Introduction to Local Interpretable Model-Agnostic Explanations (LIME)

My full source code for this experiment is available here in my GitHub repository.

Currently unrated