(Comments)
In this tutorial, I will show you how run inference of your custom trained TensorFlow object detection model on Intel graphics at least x2 faster with OpenVINO toolkit compared to TensorFlow CPU backend. My benchmark also shows the solution is only 22% slower compared to TensorFlow GPU backend with GTX1070 card.
If you are new to OpenVINO toolkit, it is suggested to take a look at the previous tutorial on how to convert a Keras image classification model and accelerate inference speed with OpenVINO. This time, we will take a step further with object detection model.
To convert a TensorFlow frozen object detection graph to OpenVINO Intermediate Representation(IR) files, you will have those two files ready,
You can also download my copy of those files from the GitHub release.
Similar to the previous image classification model, you will specify the data type to quantize the model weights.
The data type can be "FP16" or "FP32" depends on what device you want to run the converted model.
Generally speaking, FP16 quantized model cuts down the size of the weights by half, run much faster but may come with minor degraded accuracy.
Another important file is the OpenVINO subgraph replacement configuration file that describes rules to convert specific TensorFlow topologies. For the models downloaded from the TensorFlow Object Detection API zoo, you can find the configuration files in <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf
Use:
ssd_v2_support.json
- for frozen SSD topologies from the faster_rcnn_support.json
- for frozen Faster R-CNN topologies from the faster_rcnn_support_api_v1.7.json
- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 or higher.We will ssd_v2_support.json
With all the setting ready, we can start the model optimization script.
!python {mo_tf_path} \
--input_model {pb_file} \
--output_dir {output_dir} \
--tensorflow_use_custom_operations_config {configuration_file} \
--tensorflow_object_detection_api_pipeline_config {pipeline} \
--input_shape {input_shape_str} \
--data_type {data_type} \
You can find the .xml and .bin files located in the {output_dir}
Loading the model with OpenVINO toolkit is similar to the previous image classification example. While how we preprocess inputs and interpret the outputs are different.
For the image preprocessing, it is a good practice to resize the image width and height to match with what is defined in the `ssd_mobilenet_v2_coco.config` file, which is 300 x 300. Besides, there is no need to normalize the pixel value to 0~1, just keep them as UNIT8 ranging between 0 to 255.
Here is the preprocessing function.
def pre_process_image(imagePath, img_shape):
"""pre process an image from image path.
Arguments:
imagePath {str} -- input image file path.
img_shape {tuple} -- Target height and width as a tuple.
Returns:
np.array -- Preprocessed image.
"""
# Model input format
assert isinstance(img_shape, tuple) and len(img_shape) == 2
n, c, h, w = [1, 3, img_shape[0], img_shape[1]]
image = Image.open(imagePath)
processed_img = image.resize((h, w), resample=Image.BILINEAR)
processed_img = np.array(processed_img).astype(np.uint8)
# Change data layout from HWC to CHW
processed_img = processed_img.transpose((2, 0, 1))
processed_img = processed_img.reshape((n, c, h, w))
return processed_img, np.array(image)
Now you can feed the preprocessed data to the network and get its prediction outputs as a dictionary which contains a key, "DetectionOutput".
# Run inference
img_shape = (img_height, img_height)
processed_img, image = pre_process_image(fname, img_shape)
res = exec_net.infer(inputs={input_blob: processed_img})
print(res['DetectionOutput'].shape)
# Expect: (1, 1, 100, 7)
The Inference Engine "DetectionOutput" layer produces one tensor with seven numbers for each actual detection, each of the 7 numbers stands for,
.pbtxt
After known this, we can easily filter the results with a prediction probability threshold and visualize them as bounding boxes drawing around the detected objects.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
probability_threshold = 0.5
preds = [pred for pred in res['DetectionOutput'][0][0] if pred[2] > probability_threshold]
ax = plt.subplot(1, 1, 1)
plt.imshow(image) # slice by z axis of the box - box[0].
for pred in preds:
class_label = pred[1]
probability = pred[2]
print('Predict class label:{}, with probability: {}'.format(
class_label, probability))
box = pred[3:]
box = (box * np.array(image.shape[:2][::-1] * 2)).astype(int)
x_1, y_1, x_2, y_2 = box
rect = patches.Rectangle((x_1, y_1), x_2-x_1, y_2 -
y_1, linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rect)
ax.text(x_1, y_1, '{:.0f} - {:.2f}'.format(class_label,
probability), fontsize=12, color='yellow')
Here is an example to show the results of object detection.
Let's try the ssd_mobilenet_v2 object detection model on various hardware and configs, and here is what you get.
The benchmark setup,
As you can see the OpenVINO model running on the Intel GPU with quantized weights achieves 50 FPS(Frames/Seconds) while TensorFlow CPU backend only gets around 18.9 FPS.
Running the model with neural compute stick 2 either on Windows or Raspberry Pi also shows promising results.
You can run the openvino_inference_benchmark.py and local_inference_test.py scripts if you want to reproduce the benchmark yourself.
This post walks you through how to convert a custom trained TensorFlow object detection model to OpenVINO format and inference on various hardware and configurations. Their benchmark results can help you to decide what is the best fit for your edge inferencing scenario.
How to run Keras model inference x3 times faster with CPU and Intel OpenVINO - blog
How to train an object detection model easy for free - blog
The GitHub repository for this post.
Comments