(Comments)

Jetson Nano Developer Kit announced at the 2019 GTC for $99 brings a new rival to the arena of edge computing hardware alongside its more pricy predecessors, Jetson TX1 and TX2. The coming of Jetson Nano gives the company a competitive advantage over other affordable options, to name a few, Movidius neural compute stick, Intel Graphics running OpenVINO and Google edge TPU.

In this post, I will show you how to run a Keras model on the Jetson Nano.

Here is a break down of how to make it happen.

- Freeze Keras model to TensorFlow graph then creates inference graph with TensorRT.
- Loads the TensorRT inference graph on Jetson Nano and make predictions.

We will do the first step on a development machine since it is computational and resource intensive way beyond what Jetson Nano can handle.

Let's get started.

Follow the official getting started guide to flash the latest SD card image, setup, and boot.

One thing to keep in mind, Jetson Nano doesn't come with WIFI radio as the latest Raspberry Pi does, so it is recommended to have a USB WIFI dongle like this one ready unless you plan to hardwire its ethernet jack instead.

There is a thread on the Nvidia developer forum about official support of TensorFlow on Jetson Nano, here is a quick run down how you can install it.

Start a terminal or SSH to your Jetson Nano, then run those commands.

```
sudo apt update
sudo apt install python3-pip libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.13.1+nv19.3 --user
```

In case you get into the error below,

Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel

Try run

sudo apt install python3.6-dev

The Python3 might gets updated to a later version in the future. You can always check your version first with **python3 --version**, and change the previous command accordingly.

It is also helpful to install Jupyter Notebook so you can remotely connect to it from a development machine.

pip3 install jupyter

Also, notice that Python OpenCV version 3.3.1 is already installed which ease a lot of pain from cross compiling. You can verify this by importing the **cv2** library from the Python3 command line interface.

Run this step on your development machine with Tensorflow nightly builds which include TF-TRT by default or run on this Colab notebook's free GPU.

First lets loads a Keras model. For this tutorial, we use pre-trained MobileNetV2 came with Keras, feel free to replace it with your custom model when necessary.

```
import os
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 as Net
model = Net(weights='imagenet')
os.makedirs('./model', exist_ok=True)
# Save the h5 file to path specified.
model.save("./model/model.h5")
```

Once you have the Keras model save as a single `.h5`

file, you can freeze it to a TensorFlow graph for inferencing.

Take notes of the input and output nodes names printed in the output. We will need them when converting `TensorRT`

inference graph and prediction.

For Keras MobileNetV2 model, they are, `['input_1'] ['Logits/Softmax']`

.

```
import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.keras.models import load_model
# Clear any previous session.
tf.keras.backend.clear_session()
save_pb_dir = './model'
model_fname = './model/model.h5'
def freeze_graph(graph, session, output, save_pb_dir='.', save_pb_name='frozen_model.pb', save_pb_as_text=False):
with graph.as_default():
graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
graph_io.write_graph(graphdef_frozen, save_pb_dir, save_pb_name, as_text=save_pb_as_text)
return graphdef_frozen
# This line must be executed before loading Keras model.
tf.keras.backend.set_learning_phase(0)
model = load_model(model_fname)
session = tf.keras.backend.get_session()
input_names = [t.op.name for t in model.inputs]
output_names = [t.op.name for t in model.outputs]
# Prints input and output nodes names, take notes of them.
print(input_names, output_names)
frozen_graph = freeze_graph(session.graph, session, [out.op.name for out in model.outputs], save_pb_dir=save_pb_dir)
```

Normally this frozen graph is what you use for deploying. However, it is not optimized to run on Jetson Nano for both speed and resource efficiency wise. That is what TensorRT comes into play, it quantizes the model from FP32 to FP16, effectively reducing the memory consumption. It also fuses layers and tensor together which further optimizes the use of GPU memory and bandwidth. All this come with little or no noticeable reduced accuracy.

And this can be done in a single call,

```
import tensorflow.contrib.tensorrt as trt
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
precision_mode='FP16',
minimum_segment_size=50
)
```

The result is also a TensorFlow graph but optimized to run on your Jetson Nano with TensorRT. Let's save it as a single `.pb`

file.

```
graph_io.write_graph(trt_graph, "./model/",
"trt_graph.pb", as_text=False)
```

Download the TensorRT graph `.pb`

file either from colab or your local machine into your Jetson Nano. You can use scp/sftp to remotely copy the file. For Windows, you can use WinSCP, for Linux/Mac you can try scp/sftp from the command line.

On your Jetson Nano, start a Jupyter Notebook with command `jupyter notebook --ip=0.0.0.0`

where you have saved the downloaded graph file to `./model/trt_graph.pb`

. The following code will load the TensorRT graph and make it ready for inferencing.

The output and the input names might be different for your choice of Keras model other than the MobileNetV2.

```
output_names = ['Logits/Softmax']
input_names = ['input_1']
import tensorflow as tf
def get_frozen_graph(graph_file):
"""Read Frozen Graph file from disk."""
with tf.gfile.FastGFile(graph_file, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
trt_graph = get_frozen_graph('./model/trt_graph.pb')
# Create session and load graph
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')
# Get graph input size
for node in trt_graph.node:
if 'input_' in node.name:
size = node.attr['shape'].shape
image_size = [size.dim[i].size for i in range(1, 4)]
break
print("image_size: {}".format(image_size))
# input and output tensor names.
input_tensor_name = input_names[0] + ":0"
output_tensor_name = output_names[0] + ":0"
print("input_tensor_name: {}\noutput_tensor_name: {}".format(
input_tensor_name, output_tensor_name))
output_tensor = tf_sess.graph.get_tensor_by_name(output_tensor_name)
```

Now, we can make a prediction with an elephant picture and see if the model gets it correctly.

```
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions
# Optional image to test model prediction.
img_path = './data/elephant.jpg'
img = image.load_img(img_path, target_size=image_size[:2])
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
feed_dict = {
input_tensor_name: x
}
preds = tf_sess.run(output_tensor, feed_dict)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
```

Let's run the inferencing several times and see how fast it can go.

```
import time
times = []
for i in range(20):
start_time = time.time()
one_prediction = tf_sess.run(output_tensor, feed_dict)
delta = (time.time() - start_time)
times.append(delta)
mean_delta = np.array(times).mean()
fps = 1 / mean_delta
print('average(sec):{:.2f},fps:{:.2f}'.format(mean_delta, fps))
```

It got a 27.18 FPS which can be considered prediction in real time. In addition, the Keras model can inference at 60 FPS on Colab's Tesla K80 GPU, which is twice as fast as Jetson Nano, but that is a data center card.

In this tutorial, we walked through how to convert, optimized your Keras image classification model with TensorRT and run inference on the Jetson Nano dev kit. Now, try another Keras ImageNet model or your custom model, connect a USB webcam/ Raspberry Pi camera to it and do a real-time prediction demo, be sure to share your results with us in the comments below.

In the future, we will look into running models for other applications, such as object detection. If you are interested in other affordable edge computing options, check out my previous post, how to run Keras model inference x3 times faster with CPU and Intel OpenVINO also works for Movidius neural compute stick on Linux/Windows and Raspberry Pi.

- How to run TensorFlow Object Detection model on Jetson Nano
- How to run Keras model on Jetson Nano
- How to do Hyper-parameters search with Bayesian optimization for Keras model
- How to run TensorBoard in Jupyter Notebook
- How to run TensorFlow object detection model faster with Intel Graphics

- December (3)
- November (3)
- October (3)
- September (5)
- August (5)
- July (4)
- June (4)
- May (4)
- April (6)
- March (5)
- February (3)
- January (4)

- deep learning (65)
- edge computing (13)
- Keras (43)
- NLP (8)
- python (59)
- PyTorch (3)
- tensorflow (29)

- tutorial (43)
- Sentiment analysis (3)
- keras (30)
- deep learning (45)

- Chengwei (70)

## Comments