How to run Keras model on Movidius neural compute stick



Movidius neural compute stick(NCS) along with some other hardware devices like UP AI Core, AIY vision bonnet and the recently revealed Google edge TPU are gradually bringing deep learning to resource-constrained IOT devices. Are we one step closer to build a DIY hunter-killer drone you have been waiting? That powerful GPU used to enable serious deep learning image processing in the past can be shrunken down to a more plug and play size, think of it as a sort of mini neural network on the go. Stop me if this is beginning to sound a little too "Terminator" for comfort.

As a maker and programmer familiar with the Keras deep learning framework, odds are, you may be able to deploy the model you trained to NCS. In this tutorial, I will show you how easy it is to train a simple MNIST Keras model and deploy it to NCS, which could be connected to either a PC or Raspberry Pi.

There are several steps,

  1. Train the model in Keras (TensorFlow backend)
  2. Save the model file and weights in Keras
  3. Turn Keras model to TensorFlow
  4. Compile TensorFlow model to NCS graph
  5. Deploy and run the graph on NCS

Let's have a look at each of them.

Training and saving the Keras model

Just like the "Hello world!" first prints on your console, training a handwritten digits MNIST model is equivalent for deep learning programmers.

Here is a Keras model does the job just fine with several convolutional layers followed by a final output stage. The complete code is on my GitHub while here is a quick snippet to show you the point.

from keras import layers, models

model = models.Sequential()
model.add(layers.Conv2D(16, 3, activation='relu', input_shape=(28, 28, 1)))
model.add(layers.Conv2D(32, 3, activation='relu'))
model.add(layers.Conv2D(64, 3, activation='relu'))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', metrics=['accuracy'], loss='categorical_crossentropy')

history =, y_train, epochs=2, batch_size=128)

output = model.predict(test_image.reshape(1, 28, 28, 1))[0]
print("Keras \r\n", output, '\r\nPredicted:',output.argmax())

Training may take 3 minutes on GPU or longer on CPU, by the way, if you don't have a GPU training machine available now, you can check out my previous tutorial on how to train your model on Google's GPU free of charge, all you need is a Gmail account.

Either way, after training, save the model and weights into two separate files like this. 

with open("model.json", "w") as file:

Alternatively, you can call the'model.h5', include_optimizer=False) to save the model in one file, notice that we exclude the optimizer by setting the include_optimizer to False, since optimizer is only used for training.

Turn Keras to TensorFlow model

Since Movidius NCSDK2 only compiles either TensorFlow or Caffe model, we will peel away the Keras binding to the TensorFlow graph. The following code handles the work, let's see how it works in case you might want to customize it in the future.

from keras.models import model_from_json
from keras import backend as K
import tensorflow as tf

model_file = "model.json"
weights_file = "weights.h5"

with open(model_file, "r") as file:
    config =

model = model_from_json(config)

saver = tf.train.Saver()
sess = K.get_session(), "./TF_Model/tf_model")

fw = tf.summary.FileWriter('logs', sess.graph)

First we turn off the learning phase, then the model is loaded in the standard Keras way from two separate files we saved previously. 

By calling K.get_session() from Keras with TensorFlow backend, a default TensorFlow session will be available. You can even further explore what's inside the TensorFlow graph by calling sess.graph.get_operations() which returns a list of TensorFlow operations in your model. That could be useful to search for operations not supported by NCS, and you can trace it down from the list. Finally, the TensorFlow Saver class saves the model into four files into the specified path.

Each file serves a different purpose,

  1. checkpoint defines the model checkpoint path which is "tf_model" in our case.
  2. .meta stores the graph structure,
  3. .data stores the values of each variable in the graph
  4. .index identifies the checkpoint.

Compile TensorFlow model with mvNCCompile

The mvNCCompile command line tool comes with NCSDK2 toolkit converts Caffe or Tensorflow networks to graph files that can be used by the Movidius Neural Compute Platform API. We will specify the input and output nodes as TensorFlow operation names for the mvNCCompile during the graph generation. You can find a list of TensorFlow operations by calling sess.graph.get_operations(), as shown in the previous section. In our case, we located 'conv2d_1_input' as input node and 'dense_2/Softmax' as output node.

Finally, the compile command will look like this, 

mvNCCompile TF_Model/tf_model.meta -in=conv2d_1_input -on=dense_2/Softmax

A default graph file named "graph" will be generated at the current directory.

Deploy the graph and make a prediction

The NCSDK2 Python API takes over, find an NCS device, connect, allocate the graph to its memory and make a prediction.

The following code shows the essential part, and the input_img is the pre-processed image as a numpy array of shape (28, 28).

The output is the same as Keras, ten numbers representing the classification probabilities for each of the ten digits, we apply argmax function to find the index of the most likely prediction.

from mvnc import mvncapi as mvnc
# get the first NCS device by its name.  For this program we will always open the first NCS device.
devices = mvnc.enumerate_devices()
# get the first NCS device by its name.  For this program we will always open the first NCS device.
dev = mvnc.Device(devices[0])
# Read a compiled network graph from file (set the graph_filepath correctly for your graph file)
with open("graph", mode='rb') as f:
    graphFileBuff =

graph = mvnc.Graph('graph1')

# Allocate the graph on the device and create input and output Fifos
in_fifo, out_fifo = graph.allocate_with_fifos(dev, graphFileBuff)

# Write the input to the input_fifo buffer and queue an inference in one call
graph.queue_inference_with_fifo_elem(in_fifo, out_fifo, input_img.astype('float32'), 'user object')

# Read the result to the output Fifo
output, userobj = out_fifo.read_elem()

The Keras model is running on NCS now! You can call it here or further enhance the demo by adding a webcam to read live image and run on Raspberry Pi single board computer instead of an Ubuntu PC. Check out the video demo here.

Bonus - predicting with webcam live image on Raspberry Pi

Installing NCSDK2 on Pi may take dozens of minutes, that is not a bad news for those impatient. But the good news is, you can choose to install only the essential part of NCSDK2 on your Pi to run the inference with the graph compiled on your Ubuntu PC.

Instead of cloning the NCSDK2 repository to your Pi which could take quite a while, download a released version zip file of NCSDK2, this can save considerable disk size since all git version control files are skipped.

Secondly, skip TensorFlow and Caffe installation during the NCSDK2 installation by modifying the ncsdk.conf file.


Running a live webcam require OpenCV3 installed, run the following four lines in a terminal does the job for your Pi.

sudo pip3 install opencv-python==
sudo apt-get update
sudo apt-get install libqtgui4
sudo apt-get install python-opencv

Once the NCSDK2 and OpenCV 3 installations are done, copy the graph file you generated into your Pi. Just remember since are skipped quite a lot stuff, the mvNC** command will not run on your Pi since they depend on Caffe and TensorFlow installation.

The MNIST model was trained to recognize handwritten digits of white color in the black background of a grayscale image with 28x28 resolution, to convert an image captured, some pre-processing step is necessary.

  1. Crop the center region of the image
  2. Do edge detection to find edges of the image, this step also turns the image to grayscale
  3. Dilate the edges which makes the edges thicker to fill areas between two close parallel edges.
  4. Resize the image to 28 x 28

With that in mind, a similar implementation with Python OpenCV 3 can be found in file To wrap up the webcam demo, for each frame captured, we pass it to the image preprocess function then feed to NCS graph which returns the final prediction probabilities as before. From there, we present the finally predicted result as an overlay on the image showing on display.


Conclusion and further reading

Now you have deployed a Keras model to NCS. Keep in mind since NCS was built with the intention of "vision processing unit", it supports convolutional layers along with some others, while recurrent neural network layers like LSTM and GRU might not work on NCS. In our demo, we tell the mvNCCompile to take the final classification output node while it is possible to use an intermediate layer as an output node, in which sense using the model as a feature extractor, that is similar to how NCS's faceNet facial verification demo works.

Some useful resources,

TensorFlow Save and Restore

Build a DIY security camera with neural compute stick series

Current rating: 5