(Comments)
In this quick tutorial, you will learn how to setup OpenVINO and make your Keras model inference at least x3 times faster without any added hardware.
Though there are multiple options to speed up your deep learning inference on the edge devices, to name a few,
All previously mentioned acceleration options all came with an additional cost. However, if an edge device already has an Intel CPU, you might as well accelerate its deep learning inference speed x3 time for free with Intel's OpenVINO toolkit.
You might wonder where does the extra speedup come from without additional hardware?
First and for most, since OpenVINO is an Intel product, it is optimized for its processors.
The OpenVINO inferencing engine can inference models with either CPU or Intel's integrated GPU with different input precision supports.
CPU supports FP32 and Int8 while its GPU supports FP16 and FP32.
The CPU plugin leverages the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN) as well as the OpenMP to parallelize calculations.
Here is the supported plugins and quantization precision matrix for OpenVINO 2019 R1.01.
PLUGIN |
FP32 |
FP16 |
I8 |
CPU plugin |
Supported and preferred |
Not supported |
Supported |
GPU plugin |
Supported |
Supported and preferred |
Not Supported |
FPGA plugin |
Supported |
Supported |
Not supported |
VPU plugins |
Not supported |
Supported |
Not supported |
GNA plugin |
Supported |
Not supported |
Not supported |
There is the model optimization as you will see later in this tutorial, during which extra steps are taken to make the model more compact for inference.
Now, let's setup OpenVINO on your machine, choose your OS on this page, follow the instruction to download and install it.
System requirement
Operating Systems
If you already installed Python 3.5+, it is safe to ignore the notice to install Python 3.6+.
Once the installation is done, check out the Linux, Window 10 or macOS setup guide to finish the installation.
You can download the full source code for this tutorial from my GitHub, it includes an all in one Jupyter notebook walks
Run setupvars.bat
jupyter notebook
C:\Program Files (x86)\IntelSWTools\openvino\bin\setupvars.bat
Or in Linux add the following line to ~/.bashrc
source /opt/intel/openvino/bin/setupvars.sh
Here is an overview of the workflow to convert a Keras model to OpenVINO model and make a prediction.
.h5
.h5
.pb
mo_tf.py
.pb
.h5
fileFor the tutorial, we will load a pre-trained ImageNet classification InceptionV3 model from Keras,
# Force use CPU only.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
from tensorflow.keras.applications.inception_v3 import InceptionV3 as Net
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.inception_v3 import preprocess_input, decode_predictions
import numpy as np
img_height = 224
model = Net(weights='imagenet')
# Optional image to test model prediction.
img_path = './data/elephant.jpg'
# Path to save the model h5 file.
model_fname = './model/model.h5'
# Load the image for prediction.
img = image.load_img(img_path, target_size=(img_height, img_height))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
# Save the h5 file to path specified.
model.save(model_fname)
.pb
fileThis step removes any layers and operations not necessary for inference.
import tensorflow as tf
from tensorflow.python.framework import graph_io
from tensorflow.keras.models import load_model
# Clear any previous session.
tf.keras.backend.clear_session()
save_pb_dir = './model'
model_fname = './model/model.h5'
def freeze_graph(graph, session, output, save_pb_dir='.', save_pb_name='frozen_model.pb', save_pb_as_text=False):
with graph.as_default():
graphdef_inf = tf.graph_util.remove_training_nodes(graph.as_graph_def())
graphdef_frozen = tf.graph_util.convert_variables_to_constants(session, graphdef_inf, output)
graph_io.write_graph(graphdef_frozen, save_pb_dir, save_pb_name, as_text=save_pb_as_text)
return graphdef_frozen
# This line must be executed before loading Keras model.
tf.keras.backend.set_learning_phase(0)
model = load_model(model_fname)
session = tf.keras.backend.get_session()
INPUT_NODE = [t.op.name for t in model.inputs]
OUTPUT_NODE = [t.op.name for t in model.outputs]
print(INPUT_NODE, OUTPUT_NODE)
frozen_graph = freeze_graph(session.graph, session, [out.op.name for out in model.outputs], save_pb_dir=save_pb_dir)
The following snippet runs in the Jupyter notebook, it locates mo_tf.py
img_height
data_type
import platform
is_win = 'windows' in platform.platform().lower()
# OpenVINO 2019
if is_win:
mo_tf_path = '"C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo_tf.py"'
else:
# mo_tf.py path in Linux
mo_tf_path = '/opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py'
pb_file = './model/frozen_model.pb'
output_dir = './model'
img_height = 224
input_shape = [1,img_height,img_height,3]
input_shape_str = str(input_shape).replace(' ','')
input_shape_str
!python {mo_tf_path} --input_model {pb_file} --output_dir {output_dir} --input_shape {input_shape_str} --data_type FP32
After running the script, you will find two new files generated under ./model
frozen_model.xml
frozen_model.bin
If you have set up the environment correctly, C:\Intel\computer_vision_sdk\python\python3.5
~/intel/computer_vision_sdk/python/python3.5
PYTHONPATH
openvino
The following snippet uses the CPU to run the inference engine, while it is also possible to run on Intel GPU if you have opted to use FP16 data_type
previously.
import os
assert 'computer_vision_sdk' in os.environ['PYTHONPATH']
from PIL import Image
import numpy as np
try:
from openvino import inference_engine as ie
from openvino.inference_engine import IENetwork, IEPlugin
except Exception as e:
exception_type = type(e).__name__
print("The following error happened while importing Python API module:\n[ {} ] {}".format(exception_type, e))
sys.exit(1)
def pre_process_image(imagePath, img_height=224):
# Model input format
n, c, h, w = [1, 3, img_height, img_height]
image = Image.open(imagePath)
processedImg = image.resize((h, w), resample=Image.BILINEAR)
# Normalize to keep data between 0 - 1
processedImg = (np.array(processedImg) - 0) / 255.0
# Change data layout from HWC to CHW
processedImg = processedImg.transpose((2, 0, 1))
processedImg = processedImg.reshape((n, c, h, w))
return image, processedImg, imagePath
# Plugin initialization for specified device and load extensions library if specified.
plugin_dir = None
model_xml = './model/frozen_model.xml'
model_bin = './model/frozen_model.bin'
# Devices: GPU (intel), CPU, MYRIAD
plugin = IEPlugin("CPU", plugin_dirs=plugin_dir)
# Read IR
net = IENetwork.from_ir(model=model_xml, weights=model_bin)
assert len(net.inputs.keys()) == 1
assert len(net.outputs) == 1
input_blob = next(iter(net.inputs))
out_blob = next(iter(net.outputs))
# Load network to the plugin
exec_net = plugin.load(network=net)
del net
# Run inference
fileName = './data/elephant.jpg'
image, processedImg, imagePath = pre_process_image(fileName)
res = exec_net.infer(inputs={input_blob: processedImg})
# Access the results and get the index of the highest confidence score
output_node_name = list(res.keys())[0]
res = res[output_node_name]
# Predicted class index.
idx = np.argsort(res[0])[-1]
# decode the predictions
from tensorflow.keras.applications.inception_v3 import decode_predictions
print('Predicted:', decode_predictions(res, top=3)[0])
Benchmark setup,
Benchmark result for all three environments - Keras, TensorFlow, and OpenVINO shown below.
Keras average(sec):0.079, fps:12.5
TensorFlow average(sec):0.069, fps:14.3
OpenVINO(CPU) average(sec):0.024, fps:40.6
The result might vary with the Intel processors you are experimenting with, but expect significant speedup compared to running inference with TensorFlow / Keras on CPU backend.
In this tutorial, you have learned how to run model inference several times faster with your Intel processor and OpenVINO toolkit compared to stock TensorFlow. While OpenVINO can not only accelerate inference on CPU, the same workflow introduced in this tutorial can easily be adapted to a Movidius neural
OpenVINO documentations you might find helpful.
Install Intel® Distribution of OpenVINO™ toolkit for Windows* 10
Install the Intel® Distribution of OpenVINO™ toolkit for Linux*
OpenVINO - Advanced Topics - CPU Plugin where you can learn more about various model optimization techniques.
Comments