After reading this tutorial, you will know how to make such a camera by putting the following pieces together.
Before getting started, make sure you have the following stuff ready.
Optional stuff you might come by without but can make your camera looks slick.
All source code and data resources are packaged into a single file, download it from my GitHub release and extract to your Pi with the following commands.
tar -xzf v1.1.tar.gz
Now we are ready to get the pieces together.
This Porcupine we are talking about isn't the cute rodent with sharp spines but an on-device wake word detection engine powered by deep learning. If you have read my previous post -How to do Real Time Trigger Word Detection with Keras, you will know what I am talking. But this time it is so lightweight that even runs on Raspberry Pi with
To install dependencies for Porcupine, run the following command in a Pi's terminal.
sudo apt update
sudo apt install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools
sudo pip3 install pyaudio soundfile -i https://pypi.douban.com/simple
python3 ./examples/porcupine_demo_non_blocking.py --keywords pineapple
If everything goes well, you will see something like this, it is waiting for you to say the keyword "pineapple".
There is a list of other pre-optimized keywords you can use located in ./porcupine/resources/keyword_files folder.
One side note, the Porcupine library is quite accurate by itself while the result can considerably be affected by the microphone. Some WebCam with only one built-in microphone can bearly capture voice clearly within limited range while the microphone array on Logitech C920 webcam formed by two microphones can cancel noises in the environment and record your voice loud and clear even with
Once the app is voice activated, the software will let the webcam capture images and try to locate objects inside.
Capture images from WebCam require OpenCV3 library installed on your Pi. People used to suck up hours of time compiling the source code and install it on Pi which can be avoided by running those three lines in a terminal now.
pip3 install opencv-python==18.104.22.168
sudo apt-get install libqtgui4
sudo apt-get install python-opencv
The captured photo enters the TensorFlow Object detection API, and the model returns four pieces of information,
The model we use for object detection is an SSD lite MobileNet V2 downloaded from the TensorFlow detection model zoo. We use it since it is small and runs fast in realtime even on Raspberry Pi. I have been testing running the model on Pi in real time at a max 1 frame per second, If you are looking for boosting the frame rate to 8 FPS or above on your Pi which is overkill for this application, feel free to check out my other tutorial on how to do it with a Movidius neural compute stick.
To install latest pre-built TensorFlow 1.9.0 and object detection API dependencies to your Pi, run those commands in a terminal.
sudo apt install libatlas-base-dev protobuf-compiler python-pil python-lxml python-tk
pip3 install tensorflow
One note here, there is no need to download the TensorFlow model repository as you usually do use the object detection API which takes around 1GB space on your Pi's SD card and wastes valuable time to download. All Python source code necessary to complete this tutorial is already packed inside the 27MB GitHub release you downloaded earlier from my repo.
The loading of the model might take around 30 seconds or so, and if everything works, you will see something like this.
So far you have learned how voice activation and object detection fits in the project, in the next post, I will show you where the rest pieces come together.
This project is largely influenced by
Use VNC to access your Pi's desktopShare on Twitter Share on Facebook