Build a DIY security camera with neural compute stick (part 2)



The previous post shows you how to build a security camera powered by Raspberry Pi and Movidius neural compute stick that detects people in real time and saves only useful image frames. This time, I will show you how understanding just enough of the source code to add an Arduino servo turret turns the camera automatically to follow people it detects. 

How does the current code work?

Whether you come from the background of programming in C for Arduino or comfortable coding in Python, getting a grasp of how the code work shouldn't be too hard. Considering our aim to modify the code and add some servo logic. There is no prior requirement for deep learning knowledge before digging into the code and customize it since deep learning model is already packed into a computation graph, a specially formatted file the neural compute sticks understands how to take the input image and generate the object detection outputs. The computation graph is compiled when you call the make run_cam command previously which invokes the mvNCCompile command in the back.

Entering, at startup, NCS is found and connected. Then the compiled computation graph is allocated into its memory. Python OpenCV library opens up the first available webcam in your system as the camera index 0 stands. All setup is complete at this point. In the main loop, for each captured webcam image it is processed by scaling to a specific resolution and have its values normalized as the model expects.

Next, all heavy lifting of the object detection is done in just two lines, an image goes into the NCS comes out the detection result. You will see alternative API calls to do this in a non-synchronized manner in the final section.

# Send the image to the NCS as 16 bit floats
ssd_mobilenet_graph.LoadTensor(resized_image.astype(numpy.float16), None)

# Get the result from the NCS
output, userobj = ssd_mobilenet_graph.GetResult()

The SSD MobileNet object detection model applied here can detect 20 different objects, like a person, bottle, car, and dog. Since we are only interested in detecting people as a security camera does, we tell the detector to ignore other classes, which can be done in the object_classifications_mask variable.

The output is an array of floating point numbers formatted in the following way.

#   a.	First fp16 value holds the number of valid detections = num_valid.
#   b.	The next 6 values are unused.
#   c.	The next (7 * num_valid) values contain the valid detections data
#       Each group of 7 values will describe an object/box These 7 values in order.
#       The values are:
#         0: image_id (always 0)
#         1: class_id (this is an index into labels)
#         2: score (this is the probability for the class)
#         3: box left location within an image as a number between 0.0 and 1.0
#         4: box top location within an image as a number between 0.0 and 1.0
#         5: box right location within an image as a number between 0.0 and 1.0
#         6: box bottom location within an image as a number between 0.0 and 1.0

From there on, the code does the filtering of the results based on your setting of the minimum score and classification mask, then creates the overlay of the bounding boxes on the image. 

That is pretty much what the code does. It is trivial to add the functionality to count how many people are detected in an image and save it when this number changes, so we know when a person enter or leave our view.

Adding a servo

Say your security camera is battery powered, WIFI enabled and is mounted in a sweet spot oversees a broad angle of view, say 300 degrees of view and it covers your garage, yard and the street all at the same time. Some security cameras have controllable turrets to turn the camera manually to face different angles, but they don't turn by themselves when the subject moves out of its view. This security camera we are building right now detects people in real time even there is no internet, it is intelligent enough to know where the people are located and how to turn itself to face the subject, but we are missing one more thing, the moving part, a turret!

Arduino sketch talking to the servo

A cheap hobby servo motor like the SG90 should do the job since it doesn't require lots of torque to turn the webcam around. How does a servo work? The length of a pulse sets the position of the servo motor. The servo expects to receive a pulse roughly every 20 milliseconds. If that pulse is high for one millisecond, then the servo angle will be zero if it is 1.5 milliseconds, then it will be at its center position, and if it is two milliseconds, it will be at 180 degrees.


There are three benefits to use an Arduino controller to control the servo angle even though Raspberry Pi already comes with PWM pins.

  1. Using an Arduino make the main Python code comparable across other platforms, for example, you can connect to Arduino and run the same demo on an Ubuntu PC.
  2. We can do precise speed control for the servo on Arduino every 20 ms which can be an overhead for raspberry to do by itself.
  3. It provides an extra layer of protection for raspberry Pi's digital pins.

A servo has three pins, typically come with color brown, red and orange, representing ground, power, and signal. Connect the ground(brown) to Arduino's GND, power(red) to Arduino's 5V or external 5V battery, signal(orange) to Arduino's pin 4 as mapped in my sketch. Keep in mind that if you have a high torque servo which might require higher voltage and current, so it is safer to power it by an external power source such as a 12V lithium battery pack with its voltage stepped down to the desired level by a DC-DC converter like the LM2596 module.

I picked an Arduino Pro Micro board which is essentially a minimal version of Arduino Leonardo, tiny but versatile, while other Arduino compatible boards should work. The code for the Arduino is freely available on my GitHub. It accepts serial port commands to turn the servo by either a relative angle or to an absolute degree.

The trick it does in the back is to make the movement of the servo smoother by controlling how many degrees it turns within a specific interval. Feel free to give it a try, instruction included in the readme.

Follow this diagram to connect all parts.


Turn the turret in Python

Have an Arduino co-processor makes the life of Raspberry Pi much easier. All it needs to do is to send commands telling Arduino to turn the servo left or right by some degrees. The decision to turn left or right comes from calculating the offset of the center of a detected person or multiple people compared to the center of the view, if the offset is greater than some threshold, the command of tuning will be issued. That is the essence of the logic, super simple, but works.

The Python code automatically locates the Arduino serial port by enumerating available serial ports' descriptions. Make sure pySerial library is installed on your Raspberry Pi.

pip3 install pyserial

Run the demo by typing the following command in Pi's console and watch the magic happen.

make run_gimbal


Conclusion and further reading

As you are experimenting with the camera demo, it may only get around 6 frames per seconds(fps). However, this number can be boosted to be around 8.69 fps by leveraging the async FIFO API comes with NCSDK2. That is out of the scope of this simple tutorial while it is available on my GitHub repo - video_objects_threaded, it is based on the official example find in folder ncappzoo/apps/video_objects_threaded. Expect to run it the similar way as you run previous demos in video_objects. Watch the demo video here if you haven't yet. Don't hesitate to post a comment if you get stucked.

Useful links

Getting Started with Arduino and Genuine products


GitHub repositories

ncsdk2 -

NCSDK2 branch ncappzoo -

My video objects detection repo -

Video objects detection with threads and FIFO achieves 8.69 fps on Pi

Current rating: 3.7