(Comments)
Previously, we have introduced and benchmarked several embedded edge computing solutions, including OpenVINO for Intel Intel neural compute sticks, CMSIS-NN for ARM microcontrollers and TensorRT model on Jetson Nano.
What they have in common is each hardware provider has their own tools and API to quantize a TensorFlow graph and combine adjacent layers to accelerate inferencing.
This time we will take a look at the RockChip RK3399Pro SoC with builtin NPU(Neural Compute Unit) rated to inference at 2.4TOPs at 8 bits precision, which is capable of running Inception V3 model at a speed over 28 FPS. You will see deploying a Keras model to the board is quite similar to previously mentioned solutions.
Let's get started with the first time setup.
Any dev board with an RK3399Pro SoC like the Rockchip Toybrick RK3399PRO Board or the Firefly Core-3399Pro should work. I have a Rockchip Toybrick RK3399PRO Board with 6GB RAM(2GB dedicated for NPU).
The board came with many connectors and interfaces similar to the Jetson Nano. One thing worth mentioning, the HDMI connector doesn't work with my monitor, however, I am able to get a USB Type-C to HDMI adapter working.
It has Fedora Linux release 28 preinstalled with the default username and passwords "toybrick".
The RK3399Pro has 6 core 64-bits CPUs with aarch64 architecture same architecture as Jetson Nano but quite different from Raspberry 3B+ which is ARMv7 32-bit only. This means any precompiled python wheel packages target Raspberry Pi will not likely work with RK3399Pro or Jetson Nano. But don't be despair, you can download the precompiled aarch64 python wheel package files from my aarch64_python_packages repo including scipy, onnx, tensorflow and rknn_toolkit from their official GitHub.
Transfer those wheel files to the RK3399Pro board then run the following command.
sudo dnf update -y
sudo dnf install -y cmake gcc gcc-c++ protobuf-devel protobuf-compiler lapack-devel
sudo dnf install -y python3-devel python3-opencv python3-numpy-f2py python3-h5py python3-lmdb
sudo dnf install -y python3-grpcio
sudo pip3 install scipy-1.2.0-cp36-cp36m-linux_aarch64.whl
sudo pip3 install onnx-1.4.1-cp36-cp36m-linux_aarch64.whl
sudo pip3 install tensorflow-1.10.1-cp36-cp36m-linux_aarch64.whl
sudo pip3 install rknn_toolkit-0.9.9-cp36-cp36m-linux_aarch64.whl
The conversion from TensorFlow graph to RKNN model will take considerable time if you choose to run on the development board. So it is recommended to get a Linux development machine which could be the Windows WSL, an Ubuntu VM or even Google Colab.
Setup your development for the first time, you can find the rknn toolkit wheel package files from their official GitHub.
pip3 install -U tensorflow scipy onnx
pip3 install rknn_toolkit-0.9.9-cp36-cp36m-linux_x86_64.whl
# Or if you have Python 3.5
# pip3 install rknn_toolkit-0.9.9-cp35-cp35m-linux_x86_64.whl
Frozen a Keras model to a single .pb
file is similar to previous tutorials. You can find the code in freeze_graph.py on GitHub. Once it is done, you will have an ImageNet InceptionV3 frozen model accepts inputs with shape (N, 299, 299, 3)
.
Take notes of the input and output node names since we will specify they when loading the frozen model with RKNN toolkit. For the InceptionV3 and many other Keras ImageNet models they will be,
INPUT_NODE: ['input_1']
OUTPUT_NODE: ['predictions/Softmax']
Then you can run the convert_rknn.py script to quantize your model to the uint8 data type or more specifically asymmetric quantized uint8 type.
For asymmetric quantization, the quantized range is fully utilized vs the symmetric mode. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Below is an illustration of the two range-based linear quantization methods. You can read more about it here.
The rknn.config
also allows you to specify the channel_mean_value
with a list of 4 values (M0, M1, M2, S0)
as a way to automatically normalize the image data with uint8(0~255) data type to different ranges in the inference pipeline. Keras ImageNet models with TensorFlow backend expect the image data values normalized between -1 to 1. To accomplish this, we set the channel_mean_value
to "128 128 128 128"
where the first three values are mean values for each of the RGB color channels, the last value is a scale parameter. The output data is calculated as follows.
R_out = (R - M0)/S0
G_out = (G - M1)/S0
B_out = (B - M2)/S0
If you use Python OpenCV to read or capture images, the color channel is in BGR order, in that case, you can set the reorder_channel
parameter of rknn.config()
to "2 1 0"
so the color channels will be reordered to RGB in the inference pipeline.
from rknn.api import RKNN
INPUT_NODE = ["input_1"]
OUTPUT_NODE = ["predictions/Softmax"]
img_height = 299
# Create RKNN object
rknn = RKNN()
# pre-process config
# channel_mean_value "0 0 0 255" while normalize the image data to range [0, 1]
# channel_mean_value "128 128 128 128" while normalize the image data to range [-1, 1]
# reorder_channel "0 1 2" will keep the color channel, "2 1 0" will swap the R and B channel,
# i.e. if the input is BGR loaded by cv2.imread, it will convert it to RGB for the model input.
# need_horizontal_merge is suggested for inception models (v1/v3/v4).
rknn.config(
channel_mean_value="128 128 128 128",
reorder_channel="0 1 2",
need_horizontal_merge=True,
quantized_dtype="asymmetric_quantized-u8",
)
# Load tensorflow model
ret = rknn.load_tensorflow(
tf_pb="./model/frozen_model.pb",
inputs=INPUT_NODE,
outputs=OUTPUT_NODE,
input_size_list=[[img_height, img_height, 3]],
)
if ret != 0:
print("Load inception_v3 failed!")
exit(ret)
# Build model
# dataset: A input data set for rectifying quantization parameters.
ret = rknn.build(do_quantization=True, dataset="./dataset.txt")
if ret != 0:
print("Build inception_v3 failed!")
exit(ret)
# Export rknn model
ret = rknn.export_rknn("./inception_v3.rknn")
if ret != 0:
print("Export inception_v3.rknn failed!")
exit(ret)
inception_v3.rknn
in the project directory, transfer the file to the dev board for inference.The inference pipeline takes care of stuff including image normalization and color channel reordering as configured in the previous step. What's left for you is loading the model, initializing the runtime environment and running the inference.
import numpy as np
import cv2
from rknn.api import RKNN
# Create RKNN object
rknn = RKNN()
img_height = 299
# Direct Load RKNN Model
ret = rknn.load_rknn("./inception_v3.rknn")
if ret != 0:
print("Load inception_v3.rknn failed!")
exit(ret)
# Set inputs
img = cv2.imread("./data/elephant.jpg")
img = cv2.resize(img, dsize=(img_height, img_height), interpolation=cv2.INTER_CUBIC)
# This can opt out if "reorder_channel" is set to "2 1 0"
# rknn.config() in `convert_rknn.py`
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# init runtime environment
print("--> Init runtime environment")
ret = rknn.init_runtime(target="rk3399pro")
if ret != 0:
print("Init runtime environment failed")
exit(ret)
# Inference
outputs = rknn.inference(inputs=[img])
outputs = np.array(outputs)
rknn.release()
Benchmark setting.
Let's run the inferencing several times and see how fast it can go.
import time
times = []
# Run inference 20 times and do the average.
for i in range(20):
start_time = time.time()
# Use the API internal call directly.
results = rknn.rknn_base.inference(
inputs=[img], data_type="uint8", data_format="nhwc", outputs=None
)
# Alternatively, use the external API call.
# outputs = rknn.inference(inputs=[img])
delta = time.time() - start_time
times.append(delta)
# Calculate the average time for inference.
mean_delta = np.array(times).mean()
fps = 1 / mean_delta
print("average(sec):{:.3f},fps:{:.2f}".format(mean_delta, fps))
It achieves an average FPS of 28.94, even faster than Jetson Nano's 27.18 FPS running a much smaller MobileNetV2 model.
This post shows you how to get started with an RK3399Pro dev board, convert and run a Keras image classification on its NPU in real-time speed.
Comments