(Comments)
TL;DR. After reading this post, you will learn how to run state of the art object detection and segmentation on a video file Fast. Even on an old laptop with an integrated graphics card, old CPU, and only 2G of RAM.
So here is the catch. This will only work if you have an internet connection and own a Google Gmail account. Since you are reading this, it is very likely you are already qualified.
All the code in the post runs entirely in the cloud. Thanks to Google's
I am going to show you how to run our code on Colab with a server-grade CPU, > 10 GB of RAM and a powerful GPU for FREE! Yes, you hear me right.
Colab was build to facilitate machine learning professionals collaborating with each other more seamlessly. I have shared my Python notebook for this post, click to open it.
Log in to your Google Gmail account on the upper right corner if you haven't done so. It will ask you to open it with Colab at the top of the screen. Then you are going to make a copy so you can edit it.
Now you should be able to click on the "Runtime" menu button to choose the version of Python and whether to use GPU/CPU to accelerate the computation.
The environment is all set. So easy isn't it? No hassles like installing Cuda and
Run this code to confirm TensorFlow can see the GPU.
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
It outputs:
Found GPU at: /device:GPU:0
Great, We are good to go!
If you are curious about what the GPU model you are using. It is a Nvidia Tesla K80 with 24G of memory. Quite powerful.
Run this code to find out yourself.
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
You will see the 24G graphics memory does help later. It makes possible to process more frames at a time to accelerate the video processing.
The demo is based on the Mask R-CNN GitHub repo. It is an implementation of Mask R-CNN on Keras+TensorFlow. It not only generates the bounding box for a detected object but also generates a mask over the object area.
Mask R-CNN has some dependencies to install before we can run the demo. Colab allows you to install Python packages through pip, and general Linux package/library through apt-get.
In case you don't know yet. Your current instance of Google Colab is running on an Ubuntu virtual machine. You can run almost every Linux command you usually do on a Linux machine.
Mask R-CNN depends on
!pip install Cython
!git clone https://github.com/waleedka/coco
!pip install -U setuptools
!pip install -U wheel
!make install -C coco/PythonAPI
It clones the coco repository from GitHub. Install build dependencies. Finally, build and install the
All this happens in the cloud virtual machine, and quite fast.
We are now ready to clone the Mask_RCNN repo from GitHub and cd into the directory.
!git clone https://github.com/matterport/Mask_RCNN
# cd to the code directory and optionally download the weights file
import os
os.chdir('./Mask_RCNN')
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
Notice how we change directory with Python script instead of running a shell 'cd' command since we are running Python in
Now you should be able to run the Mask R-CNN demo on colab like you would on a local machine. So go ahead and run it in your Colab notebook.
So far those sample images came from the GitHub repo. But how do you predict with custom images?
To upload an image to Colab notebook, there are three options that I think of.
1. Use a Free image hosting provider like the
2
After uploading images by either of those two options, you will get a link to the image, which can be downloaded to your colab VM with Linux wget command. It downloads one image to the ./images folder.
!wget https://preview.ibb.co/cubifS/sh_expo.jpg -P ./images
The first two options will be ideal if you just want to upload 1 or 2 images and don't care other people on the internet also be able to see it given the link.
3. Use Google Drive
The option is ideal if you have private images/videos/other files to be uploaded to colab.
Run this block to authenticate the VM to connect to your Google Drive.
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
It will ask for two verification code during the run.
Then execute this cell to mount the Drive to the directory 'drive'
!mkdir -p drive
!google-drive-ocamlfuse drive
You can now access your Google drive content in directory ./drive
!ls drive/
Hope you are having fun so far, why not try this on a video file?
Processing a video file will take three steps.
1. Video to images frames.
2. Process images
3. Turn processed images to output videos.
In our previous demo, we ask the model to process just one image at a time, as configured in IMAGES_PER_GPU
class InferenceConfig(coco.CocoConfig):
# Set batch size to 1 since we'll be running inference on
# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1
If we are going to process the whole video one frame at a time, it will take a long time. So instead we are going to leverage GPU to process multiple frames in parallel.
The pipeline of Mask R-CNN is quite computationally intensive and takes a lot of GPU memory. I find the Tesla K80 GPU on Colab with 24G of memory can safely process 3 images at a time. If you go beyond that, the notebook might crash in the middle of processing the video.
So in the code below, we set batch_size
capture = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'trailer1.mp4'))
while True:
ret, frame = capture.read()
# Bail out when the video file ends
if not ret:
break
# Save each frame of the video to a list
frame_count += 1
frames.append(frame)
if len(frames) == batch_size:
results = model.detect(frames, verbose=0)
for i, item in enumerate(zip(frames, results)):
frame = item[0]
r = item[1]
frame = display_instances(
frame, r['rois'], r['masks'], r['class_ids'], class_names, r['scores']
)
name = '{0}.jpg'.format(frame_count + i - batch_size)
name = os.path.join(VIDEO_SAVE_DIR, name)
cv2.imwrite(name, frame)
# Clear the frames array to start the next batch
frames = []
After running this code, you should now have all processed image files in one ./videos/save
The next step is easy, we need to generate the new video from those images. We are going to use cv2's VideoWriter to accomplish this.
But two things you want to make sure:
1. The frames need to be ordered in the same way as they are extracted from the original video. (Or backward if you prefer to watch the video that way)
# Get all image file paths to a list.
images = list(glob.iglob(os.path.join(VIDEO_SAVE_DIR, '*.*')))
# Sort the images by name index.
images = sorted(images, key=lambda x: float(os.path.split(x)[1][:-3]))
2
video = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'trailer1.mp4'));
# Find OpenCV version
(major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')
if int(major_ver) < 3 :
fps = video.get(cv2.cv.CV_CAP_PROP_FPS)
print("Frames per second using video.get(cv2.cv.CV_CAP_PROP_FPS): {0}".format(fps))
else :
fps = video.get(cv2.CAP_PROP_FPS)
print("Frames per second using video.get(cv2.CAP_PROP_FPS) : {0}".format(fps))
video.release();
Finally here is the code to generate the video from processed image frames.
def make_video(outvid, images=None, fps=30, size=None,
is_color=True, format="FMP4"):
"""
Create a video from a list of images.
@param outvid output video
@param images list of images to use in the video
@param fps frame per second
@param size size of each frame
@param is_color color
@param format see http://www.fourcc.org/codecs.php
@return see http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html
"""
from cv2 import VideoWriter, VideoWriter_fourcc, imread, resize
fourcc = VideoWriter_fourcc(*format)
vid = None
for image in images:
if not os.path.exists(image):
raise FileNotFoundError(image)
img = imread(image)
if vid is None:
if size is None:
size = img.shape[1], img.shape[0]
vid = VideoWriter(outvid, fourcc, float(fps), size, is_color)
if size[0] != img.shape[1] and size[1] != img.shape[0]:
img = resize(img, size)
vid.write(img)
vid.release()
return vid
import glob
import os
# Directory of images to run detection on
ROOT_DIR = os.getcwd()
VIDEO_DIR = os.path.join(ROOT_DIR, "videos")
VIDEO_SAVE_DIR = os.path.join(VIDEO_DIR, "save")
images = list(glob.iglob(os.path.join(VIDEO_SAVE_DIR, '*.*')))
# Sort the images by integer index
images = sorted(images, key=lambda x: float(os.path.split(x)[1][:-3]))
outvid = os.path.join(VIDEO_DIR, "out.mp4")
make_video(outvid, images, fps=30)
If you have gone this far, the processed video should now be ready to be downloaded to your local machine.
from google.colab import files
files.download('videos/out.mp4')
Free free to try your favorite video clip. Maybe intentionally decrease the frame rate when reconstructing the video to watch it in slow motion.
In the post, we walked through how to run your model on Google Colab with GPU acceleration.
You have learned how to do object detection and Segmentation on a video. Thanks to the powerful GPU on Colab, made it possible to process multiple frames in parallel to speed up the process.
If you want to learn more about the technology behind the object detection and segmentation algorithm, here is the original paper of Mask R-CNN goes through the detail of the model.
Or if you just get started with objection detection, check out my object detection/localization guide series goes through essential basics shared between many models.
Here again the Python notebook for this post, and GitHub repo for your convenience.
Share on Twitter Share on Facebook
Comments