How to run TensorFlow object detection model faster with Intel Graphics



In this tutorial, I will show you how run inference of your custom trained TensorFlow object detection model on Intel graphics at least x2 faster with OpenVINO toolkit compared to TensorFlow CPU backend. My benchmark also shows the solution is only 22% slower compared to TensorFlow GPU backend with GTX1070 card.

If you are new to OpenVINO toolkit, it is suggested to take a look at the previous tutorial on how to convert a Keras image classification model and accelerate inference speed with OpenVINO. This time, we will take a step further with object detection model.


To convert a TensorFlow frozen object detection graph to OpenVINO Intermediate Representation(IR) files, you will have those two files ready,

  • Frozen TensorFlow object detection model. i.e. `frozen_inference_graph.pb` downloaded from Colab after training.
  • The modified pipeline config file used for training. Also downloaded from Colab after training, in our case, it is the `ssd_mobilenet_v2_coco.config` file.

You can also download my copy of those files from the GitHub release.

OpenVINO model optimization

Similar to the previous image classification model, you will specify the data type to quantize the model weights.
The data type can be "FP16" or "FP32" depends on what device you want to run the converted model.

  • FP16: GPU and MYRIAD (Movidius neural compute stick)
  • FP32: CPU and GPU

Generally speaking, FP16 quantized model cuts down the size of the weights by half, run much faster but may come with minor degraded accuracy.

Another important file is the OpenVINO subgraph replacement configuration file that describes rules to convert specific TensorFlow topologies. For the models downloaded from the TensorFlow Object Detection API zoo, you can find the configuration files in the <INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf directory.


  • ssd_v2_support.json - for frozen SSD topologies from the models zoo.
  • faster_rcnn_support.json - for frozen Faster R-CNN topologies from the models zoo.
  • faster_rcnn_support_api_v1.7.json - for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 or higher.
  • ...

We will pick ssd_v2_support.json for this tutorial since it is an SSD model.

With all the setting ready, we can start the model optimization script.

!python {mo_tf_path} \
    --input_model {pb_file} \
    --output_dir {output_dir} \
    --tensorflow_use_custom_operations_config {configuration_file} \
    --tensorflow_object_detection_api_pipeline_config {pipeline} \
    --input_shape {input_shape_str} \
    --data_type {data_type} \

You can find the .xml and .bin files located in the specified {output_dir} after the conversion.

Make predictions

Loading the model with OpenVINO toolkit is similar to the previous image classification example. While how we preprocess inputs and interpret the outputs are different.

For the image preprocessing, it is a good practice to resize the image width and height to match with what is defined in the `ssd_mobilenet_v2_coco.config` file, which is 300 x 300. Besides, there is no need to normalize the pixel value to 0~1, just keep them as UNIT8 ranging between 0 to 255.

Here is the preprocessing function.

def pre_process_image(imagePath, img_shape):
    """pre process an image from image path.
        imagePath {str} -- input image file path.
        img_shape {tuple} -- Target height and width as a tuple.
        np.array -- Preprocessed image.

    # Model input format
    assert isinstance(img_shape, tuple) and len(img_shape) == 2

    n, c, h, w = [1, 3, img_shape[0], img_shape[1]]
    image =
    processed_img = image.resize((h, w), resample=Image.BILINEAR)

    processed_img = np.array(processed_img).astype(np.uint8)

    # Change data layout from HWC to CHW
    processed_img = processed_img.transpose((2, 0, 1))
    processed_img = processed_img.reshape((n, c, h, w))

    return processed_img, np.array(image)

Now you can feed the preprocessed data to the network and get its prediction outputs as a dictionary which contains a key, "DetectionOutput".

# Run inference
img_shape = (img_height, img_height)
processed_img, image = pre_process_image(fname, img_shape)
res = exec_net.infer(inputs={input_blob: processed_img})
# Expect: (1, 1, 100, 7)

The Inference Engine "DetectionOutput" layer produces one tensor with seven numbers for each actual detection, each of the 7 numbers stands for,

  • 0: batch index
  • 1: class label, defined in the label map .pbtxt file.
  • 2: class probability
  • 3: x_1 box coordinate (0~1 as a fraction of the image width reference to the upper left corner)
  • 4: y_1 box coordinate (0~1 as a fraction of the image height reference to the upper left corner)
  • 5: x_2 box coordinate (0~1 as a fraction of the image width reference to the upper left corner)
  • 6: y_2 box coordinate (0~1 as a fraction of the image height reference to the upper left corner)

After known this, we can easily filter the results with a prediction probability threshold and visualize them as bounding boxes drawing around the detected objects.

import matplotlib.pyplot as plt
import matplotlib.patches as patches

probability_threshold = 0.5

preds = [pred for pred in res['DetectionOutput'][0][0] if pred[2] > probability_threshold]

ax = plt.subplot(1, 1, 1)
plt.imshow(image)  # slice by z axis of the box - box[0].

for pred in preds:
    class_label = pred[1]
    probability = pred[2]
    print('Predict class label:{}, with probability: {}'.format(
        class_label, probability))
    box = pred[3:]
    box = (box * np.array(image.shape[:2][::-1] * 2)).astype(int)
    x_1, y_1, x_2, y_2 = box
    rect = patches.Rectangle((x_1, y_1), x_2-x_1, y_2 -
                             y_1, linewidth=2, edgecolor='red', facecolor='none')
    ax.text(x_1, y_1, '{:.0f} - {:.2f}'.format(class_label,
                                               probability), fontsize=12, color='yellow')

Here is an example to show the results of object detection.


Benchmark the inference speed

Let's try the ssd_mobilenet_v2 object detection model on various hardware and configs, and here is what you get.

The benchmark setup,

  • Inference 20 times and do the average.
  • Input image shape: (300,300,3)


As you can see the OpenVINO model running on the Intel GPU with quantized weights achieves 50 FPS(Frames/Seconds) while TensorFlow CPU backend only gets around 18.9 FPS.

Running the model with neural compute stick 2 either on Windows or Raspberry Pi also shows promising results.

You can run the and scripts if you want to reproduce the benchmark yourself.

Conclusion and further reading

This post walks you through how to convert a custom trained TensorFlow object detection model to OpenVINO format and inference on various hardware and configurations. Their benchmark results can help you to decide what is the best fit for your edge inferencing scenario.

Related materials you might find helpful

How to run Keras model inference x3 times faster with CPU and Intel OpenVINO - blog

How to train an object detection model easy for free - blog

The GitHub repository for this post.

Current rating: 2.6