How to train an object detection model with mmdetection



A while back you have learned how to train an object detection model with TensorFlow object detection API, and Google Colab's free GPU, if you haven't, check it out in the post. The models in TensorFlow object detection are quite dated and missing updates for the state of the art models like Cascade RCNN and RetinaNet. While there is a counterpart for Pytorch similar to that called mmdetection which include more pre-trained state of the art object detection models for us to train custom data with, however setting it up requires a nontrivial amount of time spent on installing the environment, setting up the config file, and dataset in the right format. The good news is you can skip those boring stuff and jump directly into the fun part to train your model.

Here is an overview of how to make it happen,

1. Annotate some images, and make train/test split.

2. Run the Colab notebook to train your model.

Step 1: Annotate some images and make train/test split

It is only necessary if you want to use your images instead of ones comes with my repository. Start by forking my repository and delete the data folder in the project directory so you can start fresh with your custom data.

If you took your images by your phone, the image resolution might be 2K or 4K depends on your phone's setting. In that case, we will scale down the image for reduced overall dataset size, and faster training speed.

You can use the script in the repository to resize your images.

First, save all your photos to one folder outside of the project directory so they won't get accidentally uploaded to GitHub later. Ideally, all photo came with jpg extension. Then run this script to resize all photos, and save them to the project directory.

python --raw-dir <photo_directory> --save-dir ./data/VOCdevkit/VOC2007/ImageSets --ext jpg --target-size "(800, 600)"

You might wonder why "VOC" in the path, that is because of the annotation tool we use generates Pascal VOC formatted annotation XML files. It is not necessary to dig into the actual format of the XML file since the annotation tool handles all of that. You guessed it, that is the same tool we use previously, LabelImg, works both on Windows and Linux. 

Download LabelImg and open it up,

1. Verify "PascalVOC" is selected, that is the default annotation format. 

2. Open your resized image folder "./data/VOCdevkit/VOC2007/ImageSets" for annotation.

3. Change save directory for the XML annotation files to "./data/VOCdevkit/VOC2007/Annotations".


As usual, use shortcuts (w: draw box, d: next file, a: previous file, etc.) to accelerate the annotation.

Once it is done, you will find those XML files located in "./data/VOCdevkit/VOC2007/Annotations" folder with the same file base names as your image files.

For the train/test split, you are going to create two files, each one containing a list of file base names, one name per line. Those two text files will be located in the folder "data/VOC2007/ImageSets/Main" named trainval.txt and test.txt respectively. If you don't want to type all the file name by hand, try cd into the "Annotations" directory and run the shell,

ls -1 | sed -e 's/\.xml$//' | sort -n

That will give you a list of nicely sorted file base names, just split them into two parts, and paste into those two text files.

Now you have the data directory structure similar to this one below.

   └── VOC2007
       ├── Annotations
          ├── 0.xml
          ├── ...
          └── 9.xml
       ├── ImageSets
          └── Main
              ├── test.txt
              └── trainval.txt
       └── JPEGImages
           ├── 0.jpg
           ├── ...
           └── 9.jpg

Update your fork of the GitHub repository with your labeled datasets so you can clone it with Colab.

git add --al
git commit -m "Update datasets"
git push

Train the model on Colab Notebook

We are ready to launch the Colab notebook and fire up the training. Similar to TensorFlow object detection API, instead of training the model from scratch, we will do transfer learning from a pre-trained backbone such as resnet50 specified in the model config file.

The notebook allows you to select the model config and set the number of training epochs.

Right now, I only tested with two model configs, faster_rcnn_r50_fpn_1x, and cascade_rcnn_r50_fpn_1x, while other configs can be incorporated as demonstrated in the notebook.

The notebook handles several things before training the model,

  1. Installing mmdetection and its dependencies.
  2. Replacing "CLASSES" in file with your custom dataset class labels.
  3. Modifying your selected model config file. Things like updating the number of classes to match with your dataset, changing dataset type to VOCDataset, setting the total training epoch number and more.

After that, it will re-run the mmdetection package installing script so the changes to the file will be updated to the system python packages.

%cd {mmdetection_dir}
!python install

Since your data directory resides outside of the mmdetection directory, we have the following cell in the notebook which creates a symbolic link into the project data directory.

os.makedirs("data/VOCdevkit", exist_ok=True)
voc2007_dir = os.path.join(project_name, "data/VOC2007")
os.system("ln -s {} data/VOCdevkit".format(voc2007_dir))

Then start the training.

!python tools/ {config_fname}

The training time depends on the size of your datasets and number of training epochs, my demo takes several minutes to complete with Colab's Tesla T4 GPU.

After training, you can test drive the model with an image in the test set like so.

%cd {mmdetection_dir}
from mmcv.runner import load_checkpoint
from mmdet.apis import inference_detector, show_result, init_detector

checkpoint_file = os.path.join(mmdetection_dir, work_dir, "latest.pth")
score_thr = 0.8

# build the model from a config file and a checkpoint file
model = init_detector(config_fname, checkpoint_file)

# test a single image and show the results
img = 'data/VOCdevkit/VOC2007/JPEGImages/15.jpg'
result = inference_detector(model, img)
show_result(img, result, model.CLASSES, score_thr=score_thr, out_file="result.jpg")

# Show the image with bbox overlays.
from IPython.display import Image

And here is the result as you expected,


Conclusion and further reading

This tutorial shows you how to train a Pytorch mmdetection object detection model with your custom dataset, and minimal effort on Google Colab Notebook.

If you are using my GitHub repo, you probably noticed that mmdetection is included as a submodule, to update that in the future run this command.

git submodule update --recursive

Considering training with another model config? You can find a list of config files here as well as their specs such as the complexity(Mem(GB)), and accuracy(box AP). Then start by adding the config file to MODELS_CONFIG at the start of the notebook.

Resources you might find helpful,

In future posts, we will look into benchmarking those custom trained model as well as their deployment to edge computing devices, stay tuned and happy coding!

Current rating: 3.5