Recent Advances in Deep Learning for Object Detection - Part 2

(Comments)

advance2

In the second part of the Recent Advances in Deep Learning for Object Detection series, we will summarize three aspects of object detection, proposal generation, feature representation learning, and learning strategy.

Proposal Generation

A proposal generator generates a set of rectangle bounding boxes, which are potential objects.

rpn

adv4-Proposal Generation 工作表.2 Traditional Computer Vision Methods Traditional Computer Vision Methods 工作表.3 i) computing the ’objectness score’ of a candidate box; ii) m... i) computing the ’objectness score’ of a candidate box; ii) merging super-pixels from original images; iii) generating multiple foreground and background segments;primary advantage of these traditional computer vision methods is that they are very simple and can generate proposals with high recall.Mainly based on low-level visual cues such as color or edges. They cannot be jointly optimized with the whole detection pipeline. Thus they are unable to exploit the power of large scale datasets to improve representation learning. 工作表.4 Anchor-based Methods Anchor-based Methods 工作表.5 Region Proposal Network (RPN) Region Proposal Network (RPN) 工作表.6 SSD - multi-scale anchors SSD - multi-scale anchors 工作表.7 256−dimensional feature vector was extracted from each anchor... 256−dimensional feature vector was extracted from each anchor and was fed into two sibling branches - classification layer and regression layer. First evaluated whether the anchor proposal was foreground or background and performed the categorical classification in the next stage. 工作表.8 Assigned categorical probabilities to each anchor proposal. Assigned categorical probabilities to each anchor proposal. 工作表.10 Single Shot Scaleinvariant Face Detector (S3FD) Single Shot Scaleinvariant Face Detector (S3FD) 工作表.11 Based on SSD with carefully designed anchors to match the obj... Based on SSD with carefully designed anchors to match the objects. According to the effective receptive field of different feature maps, different anchor priors were designed. 工作表.12 Dimension-Decomposition Region Proposal Network (DeRPN) Dimension-Decomposition Region Proposal Network (DeRPN) 工作表.13 Used an anchor string mechanism to independently match object... Used an anchor string mechanism to independently match objects width and height. This helped match objects with large scale variance and reduced the searching space. 工作表.15 Single-Shot Refinement Neural Network (RefineDet) Single-Shot Refinement Neural Network (RefineDet) 工作表.16 Refined the manually defined anchors in two steps. Significan... Refined the manually defined anchors in two steps. Significantly improved the anchor quality and final prediction accuracy in a data-driven manner. 工作表.17 Cascade R-CNN Cascade R-CNN 工作表.18 Refining proposals in a cascaded way. Refining proposals in a cascaded way. 工作表.19 MetaAnchor MetaAnchor 工作表.20 Improvement compared to other manually defined methods but th... The improvement compared to other manually defined methods but the customized anchors were still designed manually. 工作表.21 Keypoints-based Methods Keypoints-based Methods 工作表.22 Denet Denet 工作表.23 Modeled the distribution of being one of the 4 corner types o... Modeled the distribution of being one of the 4 corner types of objects. This corner-based algorithm eliminated the design of anchors and became a more effective method to produce high-quality proposals. 工作表.24 CornerNet CornerNet 工作表.25 Modeled information of top-left and bottom-right corners. Nov... Modeled information of top-left and bottom-right corners. Novel feature embedding methods and corner pooling layer to correctly match keypoints belonging to the same objects, obtaining state-of-the-art results on public benchmarks. 工作表.28 feature-selection-anchor-free (FSAF) feature-selection-anchor-free (FSAF) 工作表.29 CenterNet CenterNet 工作表.30 Combined the idea of center-based methods and corner-based me... Combined the idea of center-based methods and corner-based methods. First predicted bounding boxes by pairs of corners, and then predicted center probabilities of the initial prediction to reject easy negatives. 工作表.31 An online feature selection block is applied to train multile... An online feature selection block is applied to train multilevel center-based branches attached in each level of the feature pyramid. 工作表.32 AZnet AZnet 工作表.33 Predicted two values: zoom indicator and adjacency scores. Zo... Predicted two values: zoom indicator and adjacency scores. Zoom indicator determined whether to further divide this region which may contain smaller objects and adjacency scores denoted its objectness. Better at matching sparse and small objects compared to RPN’s anchor-object matching approach. 工作表.77 The anchor priors are manually designed with multiple scales ... The anchor priors are manually designed with multiple scales and aspect ratios in a heuristic manner. 工作表.103 Other Methods Other Methods

Feature Representation Learning

Three categories: multi-scale feature learning, contextual reasoning, and deformable feature learning.

multi-scale

adv5 工作表.1 Multi-scale feature learning Multi-scale feature learning 工作表.2 Contextual reasoning Contextual reasoning 工作表.3 deformable feature learning deformable feature learning 工作表.4 Image Pyramid Image Pyramid 工作表.5 Resize input images into a number of different scales (Image ... Resize input images into a number of different scales (Image Pyramid) and to train multiple detectors. Each of which is responsible for a certain range of scales. Examples: Scale Normalization for Image Pyramids (SNIP). 工作表.6 Integrated Features Integrated Features 工作表.7 Construct a single feature map by combining features in multi... Construct a single feature map by combining features in multiple layers and making final predictions based on the newly constructed map. Examples: Inside-Outside Network (ION), HyperNet, Multi-scale Location-aware Kernel Representation (MLKP). 工作表.8 Prediction Pyramid Prediction Pyramid 工作表.9 Predictions were made from multiple layers, where each layer ... Predictions were made from multiple layers, where each layer was responsible for a certain scale of objects. Examples: SSD, Receptive Field Block Net (RFBNet). 工作表.10 Feature Pyramid Feature Pyramid 工作表.11 Combine the advantage of Integrated Features and Prediction P... Combine the advantage of Integrated Features and Prediction Pyramid. Example: Feature Pyramid Network(FPN). 工作表.12 Region Feature Encoding Region Feature Encoding 工作表.13 ROI Pooling ROI Pooling 工作表.14 Extracted features from the down-sampled feature map and as a... Extracted features from the down-sampled feature map and as a result struggled to handle small objects. 工作表.15 ROI Warping ROI Warping 工作表.16 Encoded region features via bilinear interpolation. Due to th... The encoded region features via bilinear interpolation. Due to the downsampling operation in DCNN, there can be a misalignment of the object position in the original image and the downsampled feature maps. 工作表.17 ROI Align ROI Align 工作表.18 Addressed the quantization issue by bilinear interpolation at... Addressed the quantization issue by bilinear interpolation at fractionally sampled positions within each grid. 工作表.19 Precise ROI Pooing (PrROI Pooling) Precise ROI Pooing (PrROI Pooling) 工作表.20 Avoided any quantization of coordinates and had a continuous ... Avoided any quantization of coordinates and had a continuous gradient on bounding box coordinates. 工作表.21 Position Sensitive ROI Pooing (PSROI Pooling) Position Sensitive ROI Pooing (PSROI Pooling) 工作表.22 Enhance spatial information of the downsampled region features Enhance spatial information of the downsampled region features. 工作表.23 CoupleNet CoupleNet 工作表.24 Combining outputs generated from both ROI Pooling layer and P... Combining outputs generated from both ROI Pooling layer and PSROI Pooling layer. ROI Pooling layer extracted global region information but struggled for objects with high occlusion while PSROI Pooling layer focused more on local information. 工作表.25 Deformable ROI Pooling Deformable ROI Pooling 工作表.26 Can automatically model the image content without being const... Can automatically model the image content without being constrained by fixed receptive fields. 工作表.27 Learning the relationship between objects with their surround... Learning the relationship between objects with their surrounding context can improve the detector’s ability to understand the scenario. Two aspects: global context and region context. Examples: Spatial Memory Network (SMN), Structure Inference Net (SIN), Gated Bi-Directional CNN (GBDNet). 工作表.28 Robust to nonrigid deformation of objects. Examples: DeepIDNe... Robust to nonrigid deformation of objects. Examples: DeepIDNet developed a deformable-aware pooling layer to encode the deformation information across different object categories.

Learning Strategy

To tackle imbalance sampling, localization, acceleration, etc.

adv6 工作表.1 Training Stage Training Stage 工作表.2 Data Augmentation Data Augmentation 工作表.3 Horizontal flips of training images is used in training Faste... Horizontal flips of training images are used in training Faster R-CNN detector. A more intensive data augmentation strategy is used in one-stage detectors including rotation, random crops, expanding and color jittering. 工作表.4 Imbalance Sampling Imbalance Sampling 工作表.5 Hard negative sampling, negative proposals with higher classi... Hard negative sampling, negative proposals with higher classification loss were selected for training.Focal loss. The gradient signals of easy samples got suppressed which led the training process to focus more on hard proposals.Gradient harmonizing mechanism (GHM), not only suppressed easy proposals but also avoided the negative impact of outliers. 工作表.6 Localization Refinement Localization Refinement 工作表.7 Examples: LocNet, MultiPath Network, FitnessNMS Grid R-CNN re... Examples: LocNet, MultiPath Network, FitnessNMSGrid R-CNN replaced linear bounding box regressor with the principle of locating corner keypoints corner-based mechanism. 工作表.8 Cascade Learning Cascade Learning 工作表.9 Coarse-to-fine learning strategy which collects information f... Coarse-to-fine learning strategy which collects information from the output of the given classifiers to build stronger classifiers in a cascaded manner. RefineDet and Cascade R-CNN utilized cascade learning methods in refining object locations. 工作表.10 Others Others 工作表.11 Adversarial learning, Perceptual GAN for small object detecti... Adversarial learning, Perceptual GAN for small object detection. Learned high-resolution feature representations of small objects via an adversarial scheme. Training from Scratch. For two reasons. The bias of loss functions and data distribution between classification and detection can have an adversarial impact on the performance. Transferring a classification model for detection in a new domain can lead to more challenges. Examples: DSOD (Deeply Supervised Object Detectors), gated recurrent feature pyramid.Knowledge Distillation. Distills the knowledge in an ensemble of models into a single model via teacher-student training scheme. 工作表.12 Testing Stage Testing Stage 工作表.13 Duplicate Removal Duplicate Removal 工作表.14 Non maximum suppression(NMS), predefined threshold will resul... Non maximum suppression(NMS), the predefined threshold will result in a missing prediction, and this scenario is very common in clustered object detection.Soft-NMS, decayed the confidence score of B as a continuous function F. Avoided eliminating prediction of clustered objects. 工作表.15 Model Acceleration Model Acceleration 工作表.16 Examples: R-FCN, Light Head R-CNN, MobileNet with depth-wise ... Examples: R-FCN, Light Head R-CNN, MobileNet with depth-wise convolution layers. Optimize models off-line, such as model compression and quantization. Acceleration toolkit TensorRT.

Conclusion and further thoughts

This series gives you an overview of several critical parts you might find in deep learning for object detection as well as how they build upon each other. Finally, let's conclude the series with the network structure of Faster RCNN with FPN.

two-stage-network

Current rating: 4.8

Comments