DIY Object Detection Doodle camera with Raspberry Pi (part 1)



Let's create a camera that creates and prints some art. Check out the demo video to see the outcome.

After reading this tutorial, you will know how to make such a camera by putting the following pieces together.

  • Voice activation with Porcupine to trigger the image capture.
  • Capture webcam image in Raspberry Pi.
  • Object detection with TensorFlow object detection API
  • Doodle the detected objects
  • Prints the drawing with a mini thermal receipt printer
  • Add a shutter push button, and an indicator LED to your Pi

Before getting started, make sure you have the following stuff ready.

  1. Raspberry Pi model 3 or above with Raspbian 9(stretch) installed. Other models are untested and might require some tweak of the source code. 
  2. USB WebCam with built-in microphone, like the Logitech C920.
  3. A mini thermal receipt printer like the one on
  4. A USB to TTL adapter to connect the mini thermal receipt printer to the Pi, like the CP2102 chipset based adapter on Amazon.

Optional stuff you might come by without but can make your camera looks slick.

  1. 7.4V 2c lipo battery and DC-DC converter module rated 3A or above output current to step down the voltage to 5V. Like the LM2596 DC to DC Buck Converter on Amazon.
  2. A power switch to switch on/off the camera
  3. A single LED module to show the current state.
  4. A single push button module to manually trigger the image capture shutter.

All source code and data resources are packaged into a single file, download it from my GitHub release and extract to your Pi with the following commands.

tar -xzf v1.1.tar.gz

Now we are ready to get the pieces together.

Voice activation with Porcupine

This Porcupine we are talking about isn't the cute rodent with sharp spines but an on-device wake word detection engine powered by deep learning. If you have read my previous post -How to do Real Time Trigger Word Detection with Keras, you will know what I am talking. But this time it is so lightweight that even runs on Raspberry Pi with execellent accuracy. Porcupine is a cross-platform that runs on other OS like Android, iOS, watchOS, Linux, Mac, and Windows. There is no special training needed, specify a keyword you want the engine to detect by passing the text string into its optimizer and generate a keyword file. The engine can take multiple activation words to trigger different actions in the program. In our case, we will use the voice command "blueberry" to trigger the object detection doodle style and use "pineapple" to trigger another edge detection doodling style.

To install dependencies for Porcupine, run the following command in a Pi's terminal.

sudo apt update
sudo apt install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools
sudo pip3 install pyaudio soundfile -i
Pyaudio library allows you to open an input audio stream and monitors it using Porcupine's library. My modified Porcupine Python library will enable you to locate the library path given the OS automatically, in our case, Raspberry Pi 3 or above will load the binary ./lib/raspberry-pi/cortex-a53/ . I case you are using another Raspberry Pi model below model 3, its CPU microarchitecture is different, check out this page and add its library files accordingly.
My code assumes you have plugged a USB audio input device to your Pi, and it could either be a WebCam with a built-in microphone or a USB microphone adapter. In case you use another type of microphone, you can find the audio input device index by running the file then passes it in as the input_device_index argument to PorcupineDemo class in file
To verify the voice activation is working on your Pi, plugin a WebCam or a USB microphone and run the commands in a terminal.
cd voice-camera/porcupine
python3 ./examples/ --keywords pineapple

If everything goes well, you will see something like this, it is waiting for you to say the keyword "pineapple".


There is a list of other pre-optimized keywords you can use located in ./porcupine/resources/keyword_files folder.

One side note, the Porcupine library is quite accurate by itself while the result can considerably be affected by the microphone. Some WebCam with only one built-in microphone can bearly capture voice clearly within limited range while the microphone array on Logitech C920 webcam formed by two microphones can cancel noises in the environment and record your voice loud and clear even with extended distance.

TF Object detection with live WebCam

Once the app is voice activated, the software will let the webcam capture images and try to locate objects inside.

Capture images from WebCam require OpenCV3 library installed on your Pi. People used to suck up hours of time compiling the source code and install it on Pi which can be avoided by running those three lines in a terminal now.

pip3 install opencv-python==
sudo apt-get install libqtgui4
sudo apt-get install python-opencv

The captured photo enters the TensorFlow Object detection API, and the model returns four pieces of information,

  1. The bounding boxes of detected objects on the image,
  2. detection confidence scores for each box
  3. class labels for each object
  4. the total number of detections.

The model we use for object detection is an SSD lite MobileNet V2 downloaded from the TensorFlow detection model zoo. We use it since it is small and runs fast in realtime even on Raspberry Pi. I have been testing running the model on Pi in real time at a max 1 frame per second, If you are looking for boosting the frame rate to 8 FPS or above on your Pi which is overkill for this application, feel free to check out my other tutorial on how to do it with a Movidius neural compute stick.

To install latest pre-built TensorFlow 1.9.0 and object detection API dependencies to your Pi, run those commands in a terminal.

sudo apt install libatlas-base-dev protobuf-compiler python-pil python-lxml python-tk
pip3 install tensorflow

One note here, there is no need to download the TensorFlow model repository as you usually do use the object detection API which takes around 1GB space on your Pi's SD card and wastes valuable time to download. All Python source code necessary to complete this tutorial is already packed inside the 27MB GitHub release you downloaded earlier from my repo.

To verfiy your installation and camera are working, run this command line.

python3 ./image_processor/examples/

The loading of the model might take around 30 seconds or so, and if everything works, you will see something like this.


Conclusion and further reading

So far you have learned how voice activation and object detection fits in the project, in the next post, I will show you where the rest pieces come together.

This project is largely influenced by danmacnish/cartoonify, while you can see my version with added voice activation, compact cartoon drawing datasets, object detection API and optimized Python thermal printer library all in one 27 MB release file, compared to the original author's ~5GB cartoon dataset, ~100MB TensorFlow model and the application source code.

Continue on to part 2 of the tutorial.

Useful resources

Installing operating system images for your Raspberry Pi

Use VNC to access your Pi's desktop

Current rating: 5