How to run Keras model on Jetson Nano

(Comments)

keras-jetson-nano

Jetson Nano Developer Kit announced at the 2019 GTC for $99 brings a new rival to the arena of edge computing hardware alongside its more pricy predecessors, Jetson TX1 and TX2. The coming of Jetson Nano gives the company a competitive advantage over other affordable options, to name a few, Movidius neural compute stick, Intel Graphics running OpenVINO and Google edge TPU.

In this post, I will show you how to run a Keras model on the Jetson Nano.

Here is a break down of how to make it happen.

  1. Freeze Keras model to TensorFlow graph then creates inference graph with TensorRT.
  2. Loads the TensorRT inference graph on Jetson Nano and make predictions.

We will do the first step on a development machine since it is computational and resource intensive way beyond what Jetson Nano can handle.

Let's get started.

Setup Jetson Nano

Follow the official getting started guide to flash the latest SD card image, setup, and boot.

One thing to keep in mind, Jetson Nano doesn't come with WIFI radio as the latest Raspberry Pi does, so it is recommended to have a USB WIFI dongle like this one ready unless you plan to hardwire its ethernet jack instead.

Install TensorFlow on Jetson Nano

There is a thread on the Nvidia developer forum about official support of TensorFlow on Jetson Nano, here is a quick run down how you can install it.

Start a terminal or SSH to your Jetson Nano, then run those commands.

sudo apt update
sudo apt install python3-pip libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3 --user

In case you get into the error below,

Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel

Try run

sudo apt install python3.6-dev

The Python3 might gets updated to a later version in the future. You can always check your version first with python3 --version, and change the previous command accordingly.

It is also helpful to install Jupyter Notebook so you can remotely connect to it from a development machine.

pip3 install jupyter

Also, notice that Python OpenCV version 3.3.1 is already installed which ease a lot of pain from cross compiling. You can verify this by importing the cv2 library from the Python3 command line interface.

Step 1: Freeze Keras model and convert into TensorRT model

Run this step on your development machine with Tensorflow nightly builds which include TF-TRT by default or run on this Colab notebook's free GPU.

First lets loads a Keras model. For this tutorial, we use pre-trained MobileNetV2 came with Keras, feel free to replace it with your custom model when necessary.

import os
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 as Net

model = Net(weights='imagenet')

os.makedirs('./model', exist_ok=True)

# Save the h5 file to path specified.
model.save("./model/model.h5")

Once you have the Keras model save as a single .h5 file, you can freeze it to a TensorFlow graph for inferencing.

Take notes of the input and output nodes names printed in the output. We will need them when converting TensorRT inference graph and prediction.

For Keras MobileNetV2 model, they are, ['input_1'] ['Logits/Softmax'].

import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.keras.models import load_model


# Clear any previous session.
tf.keras.backend.clear_session()

save_pb_dir = './model'
model_fname = './model/model.h5'
def freeze_graph(graph, session, output, save_pb_dir='.', save_pb_name='frozen_model.pb', save_pb_as_text=False):
    with graph.as_default():
        graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
        graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
        graph_io.write_graph(graphdef_frozen, save_pb_dir, save_pb_name, as_text=save_pb_as_text)
        return graphdef_frozen

# This line must be executed before loading Keras model.
tf.keras.backend.set_learning_phase(0) 

model = load_model(model_fname)

session = tf.keras.backend.get_session()

input_names = [t.op.name for t in model.inputs]
output_names = [t.op.name for t in model.outputs]

# Prints input and output nodes names, take notes of them.
print(input_names, output_names)

frozen_graph = freeze_graph(session.graph, session, [out.op.name for out in model.outputs], save_pb_dir=save_pb_dir)

Normally this frozen graph is what you use for deploying. However, it is not optimized to run on Jetson Nano for both speed and resource efficiency wise. That is what TensorRT comes into play, it quantizes the model from FP32 to FP16, effectively reducing the memory consumption. It also fuses layers and tensor together which further optimizes the use of GPU memory and bandwidth. All this come with little or no noticeable reduced accuracy.

And this can be done in a single call,

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

The result is also a TensorFlow graph but optimized to run on your Jetson Nano with TensorRT. Let's save it as a single .pb file.

graph_io.write_graph(trt_graph, "./model/",
                     "trt_graph.pb", as_text=False)

Download the TensorRT graph .pb file either from colab or your local machine into your Jetson Nano. You can use scp/sftp to remotely copy the file. For Windows, you can use WinSCP, for Linux/Mac you can try scp/sftp from the command line.

Step 2: Loads TensorRT graph and make predictions

On your Jetson Nano, start a Jupyter Notebook with command jupyter notebook --ip=0.0.0.0 where you have saved the downloaded graph file to ./model/trt_graph.pb. The following code will load the TensorRT graph and make it ready for inferencing.

The output and the input names might be different for your choice of Keras model other than the MobileNetV2.

output_names = ['Logits/Softmax']
input_names = ['input_1']

import tensorflow as tf


def get_frozen_graph(graph_file):
    """Read Frozen Graph file from disk."""
    with tf.gfile.FastGFile(graph_file, "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    return graph_def


trt_graph = get_frozen_graph('./model/trt_graph.pb')

# Create session and load graph
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')


# Get graph input size
for node in trt_graph.node:
    if 'input_' in node.name:
        size = node.attr['shape'].shape
        image_size = [size.dim[i].size for i in range(1, 4)]
        break
print("image_size: {}".format(image_size))


# input and output tensor names.
input_tensor_name = input_names[0] + ":0"
output_tensor_name = output_names[0] + ":0"

print("input_tensor_name: {}\noutput_tensor_name: {}".format(
    input_tensor_name, output_tensor_name))

output_tensor = tf_sess.graph.get_tensor_by_name(output_tensor_name)

Now, we can make a prediction with an elephant picture and see if the model gets it correctly.

from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions

# Optional image to test model prediction.
img_path = './data/elephant.jpg'

img = image.load_img(img_path, target_size=image_size[:2])
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

feed_dict = {
    input_tensor_name: x
}
preds = tf_sess.run(output_tensor, feed_dict)

# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])

Benchmark results

Let's run the inferencing several times and see how fast it can go.

import time
times = []
for i in range(20):
    start_time = time.time()
    one_prediction = tf_sess.run(output_tensor, feed_dict)
    delta = (time.time() - start_time)
    times.append(delta)
mean_delta = np.array(times).mean()
fps = 1 / mean_delta
print('average(sec):{:.2f},fps:{:.2f}'.format(mean_delta, fps))

It got a 27.18 FPS which can be considered prediction in real time. In addition, the Keras model can inference at 60 FPS on Colab's Tesla K80 GPU, which is twice as fast as Jetson Nano, but that is a data center card.

Conclusion and Further reading

In this tutorial, we walked through how to convert, optimized your Keras image classification model with TensorRT and run inference on the Jetson Nano dev kit. Now, try another Keras ImageNet model or your custom model, connect a USB webcam/ Raspberry Pi camera to it and do a real-time prediction demo, be sure to share your results with us in the comments below.

In the future, we will look into running models for other applications, such as object detection. If you are interested in other affordable edge computing options, check out my previous post, how to run Keras model inference x3 times faster with CPU and Intel OpenVINO also works for Movidius neural compute stick on Linux/Windows and Raspberry Pi.

The source code for this tutorial is available on my GitHub repo. You can also skip the step 1 model conversion and download the trt_graph.pb file directly from the GitHub repo releases.

Currently unrated

Comments