Gentle guide to setup Keras deep learning framework and build a travel recommendation engine (Part 2)

(Comments)

sunset-resort

Let's continue our journey to build a travel recommendation engine. You can find the part 1 of the series on my blog. After reading this post, you will know how to turn a model trained for classification into one that extracts image feature vectors. Then we'll walk through how to compute the similarity between two images with their feature vectors. Finally, we will generate the travel recommendation with the most similar image.

For the best learning experience, I suggest opening the Colab Notebook while reading this tutorial.

The engine we are going to build is a content-based recommendation engine. If a user likes a destination photo, then the system will show him/her a similar travel destination image.

recommend

From classifier to feature extractor

In our previous post, the model was built to classify an input image as one of the 365 place/scene names.

We are going to remove the last 4 layers responsible for place logits generation and only keep the "feature extractor" part of the network.

feature-extractor

In Keras we can pop out the last 4 layers like this.

model.layers.pop()
model.layers.pop()
model.layers.pop()
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.output_layers = [model.layers[-1]]
model.layers[-1].outbound_nodes = []
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='categorical_crossentropy')
model.save_weights('models/places/places_vgg_keras_notop.h5')

The last line above saves the model weight to a file for later use. Next time we just need to define the model without the 4 classifier layers and initialize the network with the saved weights.

If you run the mode.predict again, you will notice that the output is no longer a vector of 365 floating point numbers. Instead, it is now a vector of 4096 floating point numbers. This is the feature vector, an abstract representation for the input image.

Recommend similar images

Our recommending engine takes a query image liked by a user and recommends a similar place.

The similarity between two images is computed by measuring the distance between the two feature vectors.

You can imagine measuring the distance between two feature vectors in a 4096-dimensional space. The smaller the distance, the more similar two images to each other.

Precompute image features

We can compute all known images feature vectors at runtime and compare with the queried image's feature vector. But this will be ineffective since we are basically computing those values again and again. An alternatively faster approach is to pre-compute those feature vectors and store them in memory. During the run-time, we only need to compute the query image's feature vector if it has not been computed before, that saves a lot of time especially when you have lots of images to compare.

Here is the function to compute an image's feature vector by calling the feature extractor model's predict function.

def compute_features(img_path):
    '''
    Compute and return the features array given a
    image path.
    '''
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    features = model.predict(x)[0]
    return features

And we have another function to pre-compute all known images feature vectors and store them into the memory, this only needs to be done once.

import glob
import os
from collections import namedtuple
Tag = namedtuple('Tag', ['name', 'path', 'features'])

tag_images_folder = 'images/tags'
tag_images = list(glob.iglob(os.path.join(tag_images_folder, '*.*')))

tags = []
for img_path in tag_images:
    name = os.path.splitext(os.path.basename(img_path))[0]
    features = compute_features(img_path)
    tag = Tag(name=name, path=img_path, features=features)
    tags.append(tag)

Get the most similar image

Here is the function we execute during runtime to search and display the most similar image to a new query image.

import scipy as sp
import matplotlib.pyplot as plt
import matplotlib.image as mpimg


def get_similar_photo(img_path):
    target_features = compute_features(img_path)
    distances = []
    for tag in tags:
        tag_features = tag.features
        distance = sp.spatial.distance.euclidean(tag_features, target_features)
        distances.append(distance)
    min_distance_value = min(distances)
    min_distance_index = distances.index(min_distance_value)
    
    # Show the result
    tag = tags[min_distance_index]
    fig = plt.figure()
    a=fig.add_subplot(1,2,1)
    img = mpimg.imread(img_path)
    lum_img = mpimg.imread(tag.path)
    imgplot = plt.imshow(img)
    a.set_title('Query')
    a=fig.add_subplot(1,2,2)
    imgplot = plt.imshow(lum_img)
    imgplot.set_clim(0.0,0.7)
    a.set_title('Recommend: '+tag.name)
    plt.show()
    return tag, min_distance_value

Let's give it a try by running the following line.

get_similar_photo("images/canyon2.jpg")[0].name

And here is the result, our model recommends a similar photo to our queried image.

similar-image

Further improvements

If you play with the recommending engine, you may notice it generates wrong recommendations once a while.

There are two reasons,

1. The model was trained for classification and the image feature extractor part of the network was optimized for classifying images to 356 classes, not for distinguishing similar images. 

2. The model was trained on image datasets distributed among 365 classes of places. The training set might not have enough images for a particular type of beaches or one place at different seasons.

One solution to the first problem is to use a siamese network with triplet-loss, which is popular in face verification task. The model will be trained to identify if two images are from the same place. You can check out the video introduction on Coursera about this concept, and I find it very helpful.

The solution to the second problem is to apply transfer learning to our model by "freezing" some earlier convolutional layers and train the rest of the model parameters with our custom image datasets. Transfer learning is a great way to leverage the general features learned from large image datasets when training a new image model.

Conclusion

Now, you got a taste and likely impressed by the unlimited potential of deep learning as well as getting hands-on building and running a Keras model. The journey to master any technology is not easy, deep learning is no exception. And that is what initially motivates me to create this blog site by sharing and teaching what I have learned along the way to become better at applying deep learning to real-life problems. Don't hesitate to reach out to me personally if you are looking for a solution or simply saying hello.

Current rating: 4.3

Comments