Blog | DLologyhttps://www.dlology.com/blog/2024-03-19T11:54:09+00:00ArticlesAccelerated Deep Learning inference from your browser2020-06-29T12:01:32+00:002024-03-19T11:08:25+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/accelerated-deep-learning-inference-from-your-browser/<p><img alt="logo" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/inaccel/logo.png"/></p>
<p>Data scientists and ML engineers can now speedup their deep learning applications using the power of FPGA accelerators from their browser.</p>
<p>FPGAs are adaptable hardware platforms that can offer great performance, low-latency and reduced OpEx for applications like machine learning, video processing, quantitative finance, etc. However, the easy and efficient deployment from users with no prior knowledge on FPGA was challenging.</p>
<p><a href="https://inaccel.com/" target="_blank">InAccel</a>, a pioneer on application acceleration, makes accessible the power of FPGA acceleration from your browser. Data scientists and ML engineers can now easily deploy and manage FPGAs, speeding up compute-intense workloads and reduce total cost of ownership with zero code changes.</p>
<p>InAccel provides an <a href="https://inaccel.com/" target="_blank">FPGA resource manager</a> that allows the instant deployment, scaling and resource management of FPGAs making easier than ever the utilization of FPGAs for applications like machine learning, data processing, data analytics and many more applications. Users can deploy their application from Python, Spark, Jupyter notebooks or even terminals.</p>
<p>Through the JupyterHub integration, users can now enjoy all the benefits that JupyterHub provide such as easy access to computational environment for instant execution of Jupyter notebooks. At the same time, users can now enjoy the benefits of FPGAs such as lower-latency, lower execution time and much higher performance without any prior-knowledge of FPGAs. InAccel’s framework allows the use of Xilinx’s <a href="https://www.xilinx.com/products/design-tools/vitis/vitis-libraries.html#libraries" target="_blank">Vitis Open-Source optimized libraries</a> or 3<sup>rd</sup> party IP cores (for deep learning, machine learning, data analytics, genomics, compression, encryption and computer vision applications.)</p>
<p>The Accelerated Machine Learning Platform provided by InAccel’s FPGA orchestrator can be used either on-prem or on cloud. That way, users can enjoy the simplicity of the Jupyter notebooks and at the same time experience significant speedups on their applications.</p>
<p>Users can test for free the available libraries on the InAccel cluster on the following link:</p>
<p><a href="https://inaccel.com/accelerated-data-science/" target="_blank">https://inaccel.com/accelerated-data-science/</a></p>
<h3>Accelerated Inference – A use case on ResNet50</h3>
<p>Any user can now enjoy the speedup of the FPGA accelerators from their browser. In the DL example we show how users can enjoy much faster ResNet50 inference from the same Keras python notebook with zero code changes.</p>
<p>Users can login on the InAccel portal using their Google account at <a href="https://labs.inaccel.com:8000" target="_blank">https://labs.inaccel.com:8000</a></p>
<p>They can found ready to use example for Keras on Resnet50.</p>
<p><img alt="notebook1" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/inaccel/notebook1.png"/></p>
<p>The user can see that the python code is exactly as the one that would be running on any CPU/processor. However in this example users can experience up to 2,000 FPS inference on ResNet50 with zero code changes.</p>
<p>The user can test the accelerated Keras ResNet50 inference example either with the available dataset (22,000 images) or they can download their own dataset.</p>
<p>They can also confirm that the results are correct using the validation code as it is shown below.</p>
<p><img alt="notebook2" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/inaccel/notebook2.png"/></p>
<p>Note: The platform is available for demonstration purposes. Multiple users may access the available cluster with the 2 Alveo cards and it may affect the performance of the platform. If you are interested to deploy your own data center with multiple FPGA cards or run your applications on the cloud exclusively, contact us at <a href="mailto:info@inaccel.com">info@inaccel.com</a>.</p>
<p><img alt="arch" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/inaccel/arch.png"/></p>
<p>Figure 1. Acceleration of ML, vision, finance, data analytics from your browser using Jupyter</p>
<p><span></span></p>
<p><span>You can also check the online video here: </span><a href="https://www.youtube.com/watch?v=42bsjdXVmFg" target="_blank">https://www.youtube.com/watch?v=42bsjdXVmFg</a><a href="https://www.youtube.com/watch?v=42bsjdXVmFg" target="_blank"></a></p>
<p><a href="https://www.youtube.com/watch?v=42bsjdXVmFg"><img alt="screenshot" height="364" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/2bf837290667d7e81da1f3ed09113fbc4ca32937/images/inaccel/screenshot.png" width="668"/></a></p>
<p><em>About InAccel, Inc.</em></p>
<p><em>InAccel helps enterprises to speedup their applications by using adaptive hardware accelerators. It provides a unique framework for seamless utilization of hardware accelerators from high-level framework like Spark and Jupyter. InAccel also develops high-performance accelerators for applications like machine learning, compression and data analytics. For more information, visit <a href="https://inaccel.com" target="_blank">https://inaccel.com</a> </em></p>
<p>Code available:</p>
<div class="highlight">
<pre><span class="c1"># Download and unzip test data.</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">import</span> <span class="nn">zipfile</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://github.com/Tony607/blog_statics/releases/download/v1.0/mini_test.zip"</span>
<span class="n">fname</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">url</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">fname</span><span class="p">)</span>
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">fname</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">zip_ref</span><span class="p">:</span>
<span class="n">zip_ref</span><span class="o">.</span><span class="n">extractall</span><span class="p">(</span><span class="s">'.'</span><span class="p">)</span>
<span class="c1"># Run the inference test.</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">from</span> <span class="nn">inaccel.keras.applications.resnet50</span> <span class="kn">import</span> <span class="n">decode_predictions</span><span class="p">,</span> <span class="n">ResNet50</span>
<span class="kn">from</span> <span class="nn">inaccel.keras.preprocessing.image</span> <span class="kn">import</span> <span class="n">ImageDataGenerator</span><span class="p">,</span> <span class="n">load_img</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">ResNet50</span><span class="p">(</span><span class="n">weights</span><span class="o">=</span><span class="s">'imagenet'</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">ImageDataGenerator</span><span class="p">(</span><span class="n">dtype</span><span class="o">=</span><span class="s">'int8'</span><span class="p">)</span>
<span class="n">images</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">flow_from_directory</span><span class="p">(</span><span class="s">'mini_test/'</span><span class="p">,</span> <span class="n">target_size</span><span class="o">=</span><span class="p">(</span><span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">),</span> <span class="n">class_mode</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">begin</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span>
<span class="n">preds</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">images</span><span class="p">,</span> <span class="n">workers</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">end</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">monotonic</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Duration for'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">preds</span><span class="p">),</span> <span class="s">'images: </span><span class="si">%.3f</span><span class="s"> sec'</span> <span class="o">%</span> <span class="p">(</span><span class="n">end</span> <span class="o">-</span> <span class="n">begin</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">'FPS: </span><span class="si">%.3f</span><span class="s">'</span> <span class="o">%</span> <span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">preds</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">end</span> <span class="o">-</span> <span class="n">begin</span><span class="p">)))</span>
</pre>
</div>
<p></p>How to run SSD Mobilenet V2 object detection on Jetson Nano at 20+ FPS2019-12-30T10:08:18+00:002024-03-19T09:20:51+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-ssd-mobilenet-v2-object-detection-on-jetson-nano-at-20-fps/<p><img alt="jetson_nano_ssd_v2" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/jetson/jetson_nano_ssd_v2.png"/></p>
<p>TL: DR</p>
<p>First, make sure you have flashed the latest <a href="https://developer.nvidia.com/embedded/jetpack">JetPack 4.3</a> on your Jetson Nano development SD card.</p>
<div class="highlight">
<pre><span class="c1"># Run the docker</span>
<span class="n">docker</span> <span class="n">run</span> <span class="o">--</span><span class="n">runtime</span> <span class="n">nvidia</span> <span class="o">--</span><span class="n">network</span> <span class="n">host</span> <span class="o">--</span><span class="n">privileged</span> <span class="o">-</span><span class="n">it</span> <span class="n">docker</span><span class="o">.</span><span class="n">io</span><span class="o">/</span><span class="n">zcw607</span><span class="o">/</span><span class="n">trt_ssd_r32</span><span class="o">.</span><span class="mf">3.1</span><span class="p">:</span><span class="mf">0.1</span><span class="o">.</span><span class="mi">0</span>
<span class="c1"># Then run this command to benchmark the inference speed.</span>
<span class="n">python3</span> <span class="n">trt_ssd_benchmark</span><span class="o">.</span><span class="n">py</span>
</pre>
</div>
<p>Then you will see the results similar to this.</p>
<p><img alt="ssd_v2_benchmark" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8573d60cf0f7d81cc5299a2e8a9effe837c5d60d/images/jetson/ssd_v2_benchmark.png"/></p>
<p>Now for a slightly longer description.</p>
<p>I posted <a href="https://www.dlology.com/blog/how-to-run-tensorflow-object-detection-model-on-jetson-nano/">How to run TensorFlow Object Detection model on Jetson Nano</a> about 8 months ago, realizing that just running the SSD MobileNet V1 on Jetson Nano at a speed at around 10FPS might not be enough for some applications. Besides, that approach just consumes too much memory, make no room for other memory-intensive application running alongside.</p>
<p>This time, the bigger SSD MobileNet V2 object detection model runs at 20+FPS. Twice as fast, also cutting down the memory consumption down to only 32.5% of the total 4GB memory on Jetson Nano(i.e. around 1.3GB). Plenty of memory left for running other fancy stuff. You have also noticed the CPU usage is also quite low, only around 10% over the quad-core.</p>
<p><img alt="ssd_v2_benchmark_top" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8573d60cf0f7d81cc5299a2e8a9effe837c5d60d/images/jetson/ssd_v2_benchmark_top.png"/></p>
<p>As of my knowledge, there are a bag of tricks contributes to the performance boost.</p>
<ul>
<li>What comes with <a href="https://developer.nvidia.com/embedded/jetpack">JetPack 4.3</a>, the TensorRT version 6.0.1 vs previous TensorRT Version 5.</li>
<li>The TensorFlow object detection graph is optimized and converted right on the hardware, I mean the Jetson Nano development kit I am using right now. This is because TensorRT optimizes the graph by using the available GPUs and thus the optimized graph may not perform well on a different GPU.</li>
<li>The model is now converted to a more hardware-specific format, the TensorRT engine file. But the downside is it's less flexible, restrained by the hardware and software stack it is running on. More on that later.</li>
<li>Some tricks to save memory and boost speed.</li>
</ul>
<h2>How does it work?</h2>
<p>The command lines you just ran started a docker container. If you are new to Docker, think of it as a supercharged Anaconda or Python virtual environment containerized everything necessary to reproduce mine results. If you take a closer look at the <a href="https://github.com/Tony607/jetson_nano_trt_tf_ssd/blob/master/Dockerfile">Dockerfile</a> on my GitHub repo which describes how the container image was built, you can see how all the dependencies are set up, including all the apt and Python packages.</p>
<p>The docker image is built upon the latest JetPack 4.3 - L4T R32.3.1 base image. To make an inference with TensorRT engine file, the two important Python packages are required, TensorRT and Pycuda. Building Pycuda Python package from source on Jetson Nano might take some time, so I decided to pack the pre-build package into a wheel file and make the Docker build process much smoother. Notice that Pycuda prebuilt with JetPack 4.3 is not compatible with older versions of Jetpack and vers visa. As of the TensorRT python package, it came from the Jetson Nano at directory <code>/usr/lib/python3.6/dist-packages/tensorrt/</code>. All I did is to zip that directory into a <code>tensorrt.tar.gz</code> file. Guess what, no TensorFlow GPU Python package is required at the inference time. Consider how many memory we can save by just skipping importing the TensorFlow GPU Python package.</p>
<p>You can find the TensorRT engine file build with JetPack 4.3 named <strong>TRT_ssd_mobilenet_v2_coco.bin</strong> at <a href="https://github.com/Tony607/jetson_nano_trt_tf_ssd/tree/master/packages/jetpack4.3">my GitHub repository</a>. Sometimes, you might also see the TensorRT engine file named with the <code>*.engine</code> extension like in the JetBot system image. If you want to convert the file yourself, take a look at JK Jung's <a href="https://github.com/jkjung-avt/tensorrt_demos/blob/master/ssd/build_engine.py">build_engine.py</a> script.</p>
<p>Now, for the limitation of the TensorRT engine file approach. It simply won't work across different JetPack version. The reason came from how the engine file is built by searching through <span>CUDA kernels for the fastest implementation available, and thus it is necessary to use the same GPU and software stack(CUDA, CuDnn, TensorRT, etc.) for building like that on which the optimized engine will run. TensorRT engine file is like a dress tailored exclusively for the setup, but its performance is amazing when fitted on the right person/dev board.</span></p>
<p><span>Another limitation came with the boost of speed and lower memory footprint is the loss of precision, take the following prediction result as an example, a dog is mistakenly predicted as a bear. This might be a result of the quantization of model weights from FP32 to FP16 or other optimization trade-offs.</span></p>
<p><img alt="result" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8573d60cf0f7d81cc5299a2e8a9effe837c5d60d/images/jetson/result.jpg"/></p>
<h2><span>Some tricks to save memory and boost speed</span></h2>
<p><span>Shut down the GUI and run in command-line mode. If you are already inside the GUI desktop environment, simply press "Ctrl+Alt+F2" to enter the non-GUI mode, log in your account from there and type "<code>service gdm stop</code>", that will stop the Ubuntu GUI environment and save you around 8% of the 4GB memory.</span></p>
<p><span>Force to the CPU and GPU in maximum clock speed by typing "jetson_clocks" in the command line. If you have a PWM fan attached to the board and bothered with the FAN's noise, you can tune it down by creating a new settings file like this.</span></p>
<div class="highlight">
<pre><span class="n">cd</span> <span class="o">~</span>
<span class="n">jetson_clocks</span>
<span class="n">jetson_clocks</span> <span class="o">--</span><span class="n">store</span>
<span class="n">sed</span> <span class="o">-</span><span class="n">i</span> <span class="s">'s/target_pwm:255/target_pwm:30/g'</span> <span class="n">l4t_dfs</span><span class="o">.</span><span class="n">conf</span>
<span class="n">jetson_clocks</span> <span class="o">--</span><span class="n">restore</span> <span class="n">l4t_dfs</span><span class="o">.</span><span class="n">conf</span>
</pre>
</div>
<p><span></span></p>
<h2><span>Conclusion and further reading</span></h2>
<p><span>This guide has shown you the easiest way to reproduce my results to run SSD Mobilenet V2 object detection on Jetson Nano at 20+ FPS. Explaining how it works and the limitation to be aware of before applying this to a real application.</span></p>
<p><span>Don't forget to grab the source code for this post on <a href="https://github.com/Tony607/jetson_nano_trt_tf_ssd">my GitHub</a>.</span></p>Automatic Defect Inspection with End-to-End Deep Learning2019-11-02T11:10:24+00:002024-03-18T12:13:07+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/automatic-defect-inspection-with-end-to-end-deep-learning/<p><img alt="defect-detection" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/336b6920ac43dfb1532e775415bafc8860564570/images/segmentation/defect-detection.png"/></p>
<p>In this tutorial, I will show you how to build a deep learning model to find defects on a surface, a popular application in many industrial inspection scenarios.</p>
<p><img alt="industrial-applications" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/5e088805c7758be7e0317efeceaae8fcf2b47b97/images/segmentation/industrial-applications.png"/></p>
<p><em>Courtesy of Nvidia</em></p>
<h2>Build the model</h2>
<p>We will apply U-Net as a DL model for 2D industrial defect inspection. When there is a shortage of labeled data and fast performance is needed, U-net is a great choice. The basic architecture is an encoder-decoder pair with skip connections to combine low-level feature maps with higher-level ones. To verify the effectiveness of our model, we will use the <a href="https://resources.mpi-inf.mpg.de/conference/dagm/2007/prizes.html">DAGM dataset</a>. The benefit of using U-Net is that it doesn't contain any dense layer, so the trained DL models are typically scaled invariant, meaning they need not be retrained across image sizes to be effective for multiple input sizes.</p>
<p>Here is the model structure.</p>
<p><img alt="u-net-model" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/5e088805c7758be7e0317efeceaae8fcf2b47b97/images/segmentation/u-net-model.png"/></p>
<p>As you can see we use four 2x2 max pool operations for downsampling, which reduces that resolutions by half for 4 times. On the right side, 2x2 Conv2DTranspose(called Deconvolution) upsamples the image back to its original resolution. In order for the downsampling and upsampling to work, the image resolution must be divisible by 16(or 2<sup>4</sup>), that is why we resized our input image and mask to 512x512 resolution from the original DAGM dataset of size 500x500.</p>
<p>These skip connections from earlier layers in the network (prior to a downsampling operation) should provide the necessary detail in order to reconstruct accurate shapes for segmentation boundaries. Indeed, we can recover more fine-grain detail with the addition of these skip connections.</p>
<p>It is simple to compose such a model in Keras functional API.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">Model</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">Input</span><span class="p">,</span> <span class="n">Conv2D</span><span class="p">,</span> <span class="n">MaxPooling2D</span><span class="p">,</span> <span class="n">UpSampling2D</span><span class="p">,</span> <span class="n">Lambda</span><span class="p">,</span> <span class="n">Conv2DTranspose</span><span class="p">,</span> <span class="n">concatenate</span>
<span class="k">def</span> <span class="nf">get_small_unet</span><span class="p">():</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="n">Input</span><span class="p">((</span><span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">inputs_norm</span> <span class="o">=</span> <span class="n">Lambda</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">/</span><span class="mf">127.5</span> <span class="o">-</span> <span class="mf">1.</span><span class="p">)</span>
<span class="n">conv1</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">inputs</span><span class="p">)</span>
<span class="n">conv1</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv1</span><span class="p">)</span>
<span class="n">pool1</span> <span class="o">=</span> <span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))(</span><span class="n">conv1</span><span class="p">)</span>
<span class="n">conv2</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">pool1</span><span class="p">)</span>
<span class="n">conv2</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv2</span><span class="p">)</span>
<span class="n">pool2</span> <span class="o">=</span> <span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))(</span><span class="n">conv2</span><span class="p">)</span>
<span class="n">conv3</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">pool2</span><span class="p">)</span>
<span class="n">conv3</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv3</span><span class="p">)</span>
<span class="n">pool3</span> <span class="o">=</span> <span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))(</span><span class="n">conv3</span><span class="p">)</span>
<span class="n">conv4</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">pool3</span><span class="p">)</span>
<span class="n">conv4</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv4</span><span class="p">)</span>
<span class="n">pool4</span> <span class="o">=</span> <span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))(</span><span class="n">conv4</span><span class="p">)</span>
<span class="n">conv5</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">pool4</span><span class="p">)</span>
<span class="n">conv5</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv5</span><span class="p">)</span>
<span class="n">up6</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span>
<span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">strides</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv5</span><span class="p">),</span> <span class="n">conv4</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">conv6</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">up6</span><span class="p">)</span>
<span class="n">conv6</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv6</span><span class="p">)</span>
<span class="n">up7</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span>
<span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">strides</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv6</span><span class="p">),</span> <span class="n">conv3</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">conv7</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">up7</span><span class="p">)</span>
<span class="n">conv7</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv7</span><span class="p">)</span>
<span class="n">up8</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span>
<span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">strides</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv7</span><span class="p">),</span> <span class="n">conv2</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">conv8</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">up8</span><span class="p">)</span>
<span class="n">conv8</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv8</span><span class="p">)</span>
<span class="n">up9</span> <span class="o">=</span> <span class="n">concatenate</span><span class="p">([</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span>
<span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">strides</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv8</span><span class="p">),</span> <span class="n">conv1</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">conv9</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">up9</span><span class="p">)</span>
<span class="n">conv9</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">)(</span><span class="n">conv9</span><span class="p">)</span>
<span class="n">conv10</span> <span class="o">=</span> <span class="n">Conv2D</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'sigmoid'</span><span class="p">)(</span><span class="n">conv9</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="o">=</span><span class="n">conv10</span><span class="p">)</span>
<span class="k">return</span> <span class="n">model</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">get_small_unet</span><span class="p">()</span>
</pre>
</div>
<h2>Loss and metrics</h2>
<p>The most commonly used loss function for the task of image segmentation is a <strong>pixel-wise cross-entropy loss</strong>. This loss examines each pixel individually, comparing the class predictions (depth-wise pixel vector) to our one-hot encoded target vector.</p>
<p>Because the cross-entropy loss evaluates the class predictions for each pixel vector individually and then averages over all pixels, we're essentially asserting equal learning to each pixel in the image. This can be a problem if your various classes have unbalanced representation in the image, as training can be dominated by the most prevalent class. In our case, it is the foreground-to-background imbalance.</p>
<p>Another popular loss function for image segmentation tasks is based on the <strong>Dice coefficient</strong>, which is essentially a measure of overlap between two samples. This measure ranges from 0 to 1 where a Dice coefficient of 1 denotes perfect and complete overlap. The Dice coefficient was originally developed for binary data, and can be calculated as:</p>
<p><img alt="dice-coefficient" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/5e088805c7758be7e0317efeceaae8fcf2b47b97/images/segmentation/dice-coefficient.png"/></p>
<p>where |A∩B| represents the common elements between sets A and B, and |A| represents the number of elements in set A (and likewise for set B).</p>
<p>With respect to the neural network output, the numerator is concerned with the common activations between our prediction and target mask, whereas the denominator is concerned with the number of activations in each mask separately. This has the effect of normalizing our loss according to the size of the target mask such that the soft Dice loss does not struggle learning from classes with lesser spatial representation in an image.</p>
<p>Here we use add-one or Laplace smoothing, which simply adds one to each count. Add-one smoothing can be interpreted as a uniform prior which reduces overfitting and make the model easier to converge.</p>
<p>To implement a smoothed Dice coefficient loss.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras</span> <span class="kn">import</span> <span class="n">backend</span> <span class="k">as</span> <span class="n">K</span>
<span class="k">def</span> <span class="nf">smooth_dice_coeff</span><span class="p">(</span><span class="n">smooth</span><span class="o">=</span><span class="mf">1.</span><span class="p">):</span>
<span class="n">smooth</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">smooth</span><span class="p">)</span>
<span class="c1"># IOU or dice coeff calculation</span>
<span class="k">def</span> <span class="nf">IOU_calc</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
<span class="n">y_true_f</span> <span class="o">=</span> <span class="n">K</span><span class="o">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">y_true</span><span class="p">)</span>
<span class="n">y_pred_f</span> <span class="o">=</span> <span class="n">K</span><span class="o">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">y_pred</span><span class="p">)</span>
<span class="n">intersection</span> <span class="o">=</span> <span class="n">K</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">y_true_f</span> <span class="o">*</span> <span class="n">y_pred_f</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="n">intersection</span> <span class="o">+</span> <span class="n">smooth</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">K</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">y_true_f</span><span class="p">)</span> <span class="o">+</span> <span class="n">K</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">y_pred_f</span><span class="p">)</span> <span class="o">+</span> <span class="n">smooth</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">IOU_calc_loss</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">):</span>
<span class="k">return</span> <span class="o">-</span><span class="n">IOU_calc</span><span class="p">(</span><span class="n">y_true</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">)</span>
<span class="k">return</span> <span class="n">IOU_calc</span><span class="p">,</span> <span class="n">IOU_calc_loss</span>
<span class="n">IOU_calc</span><span class="p">,</span> <span class="n">IOU_calc_loss</span> <span class="o">=</span> <span class="n">smooth_dice_coeff</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</pre>
</div>
<p>Here we compared the performance for both the binary cross-entropy loss and smoothed Dice coefficient loss.</p>
<p><img alt="binary-cross-entropy-loss" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/5e088805c7758be7e0317efeceaae8fcf2b47b97/images/segmentation/binary-cross-entropy-loss.png"/></p>
<p><img alt="dice-coefficient-loss" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/5e088805c7758be7e0317efeceaae8fcf2b47b97/images/segmentation/dice-coefficient-loss.png"/></p>
<p>As you can see, the model trained with Dice coefficient loss converged faster and achieved a better final IOU accuracy. Regarding the final test prediction result, the model trained with D<span>ice coefficient loss delivered sharper segmentation edges that outperformed model trained with cross-entropy loss.</span></p>
<h2>Conclusion and further reading</h2>
<p>In this quick tutorial, you have learned how to build a deep learning model which can be trained end to end and detect defect for industrial applications. T<span>he </span><a href="https://resources.mpi-inf.mpg.de/conference/dagm/2007/prizes.html">DAGM dataset</a> used in the post is relatively simple which makes it easy for fast prototyping and verification. However, in the real-world, image data might contain much richer contexts which require a deeper and more complex model to comprehend, one simple way to accomplish this is by experimenting with an increased number of kernels for CNN layers. While there are other options like <a href="https://arxiv.org/abs/1611.09326">this paper</a>, the author proposed to replace each CNN block with a dense block that can be more capable when learning complex contextual features.</p>
<p>You can reproduce the results of this post by running <a href="https://colab.research.google.com/github/Tony607/Industrial-Defect-Inspection-segmentation/blob/master/Industrial_Defect_Inspection_with_image_segmentation.ipynb">this notebook</a> on Google Colab with free GPU.</p>
<p>Source code available on <a href="https://github.com/Tony607/Industrial-Defect-Inspection-segmentation">my GitHub</a>.</p>How to train Detectron2 with Custom COCO Datasets2019-10-13T14:28:42+00:002024-03-19T11:54:09+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-train-detectron2-with-custom-coco-datasets/<p><img alt="/detectron2-custom" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/detectron2-custom.png"/></p>
<p>Along with the latest PyTorch 1.3 release came with the next generation ground-up rewrite of its previous object detection framework, now called Detectron2. This tutorial will help you get started with this framework by training an instance segmentation model with your custom COCO datasets. If you want to know how to create COCO datasets, please read my previous post - <a href="https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-instance-segmentation/">How to create custom COCO data set for instance segmentation</a>.</p>
<p>For a quick start, we will do our experiment in <a href="https://colab.research.google.com/github/Tony607/detectron2_instance_segmentation_demo/blob/master/Detectron2_custom_coco_data_segmentation.ipynb">a Colab Notebook</a> so you don't need to worry about setting up the development environment on your own machine before getting comfortable with Pytorch 1.3 and Detectron2.</p>
<h2>Install Detectron2</h2>
<p>In the Colab notebook, just run those 4 lines to install the latest Pytorch 1.3 and Detectron2.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">U</span> <span class="n">torch</span> <span class="n">torchvision</span>
<span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="n">git</span><span class="o">+</span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">facebookresearch</span><span class="o">/</span><span class="n">fvcore</span><span class="o">.</span><span class="n">git</span>
<span class="err">!</span><span class="n">git</span> <span class="n">clone</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">facebookresearch</span><span class="o">/</span><span class="n">detectron2</span> <span class="n">detectron2_repo</span>
<span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">e</span> <span class="n">detectron2_repo</span>
</pre>
</div>
<p>Click "RESTART RUNTIME" in the cell's output to let your installation take effect.</p>
<p><img alt="restart" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/restart.png"/></p>
<h2>Register a COCO dataset</h2>
<p>To tell Detectron2 how to obtain your dataset, we are going to "register" it.</p>
<p>To demonstrate this process, we use<span> </span><a href="https://github.com/Tony607/mmdetection_instance_segmentation_demo" rel="nofollow" target="_blank">the fruits nuts segmentation dataset</a><span> </span>which only has 3 classes: data, fig, and hazelnut. We'll train a segmentation model from an existing model pre-trained on the COCO dataset, available in detectron2's model zoo.</p>
<p>You can download the dataset like this.</p>
<div class="highlight">
<pre><span class="c1"># download, decompress the data</span>
<span class="err">!</span><span class="n">wget</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">Tony607</span><span class="o">/</span><span class="n">detectron2_instance_segmentation_demo</span><span class="o">/</span><span class="n">releases</span><span class="o">/</span><span class="n">download</span><span class="o">/</span><span class="n">V0</span><span class="o">.</span><span class="mi">1</span><span class="o">/</span><span class="n">data</span><span class="o">.</span><span class="n">zip</span>
<span class="err">!</span><span class="n">unzip</span> <span class="n">data</span><span class="o">.</span><span class="n">zip</span> <span class="o">></span> <span class="o">/</span><span class="n">dev</span><span class="o">/</span><span class="n">null</span>
</pre>
</div>
<p><span>Or you can upload your own dataset from here.</span></p>
<p><img alt="upload" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/upload.png"/></p>
<p><span>Register the <strong>fruits_nuts</strong> dataset to detectron2, following the </span><a href="https://github.com/facebookresearch/detectron2/blob/master/docs/tutorials/datasets.md" rel="nofollow" target="_blank">detectron2 custom dataset tutorial</a><span>.</span></p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">detectron2.data.datasets</span> <span class="kn">import</span> <span class="n">register_coco_instances</span>
<span class="n">register_coco_instances</span><span class="p">(</span><span class="s">"fruits_nuts"</span><span class="p">,</span> <span class="p">{},</span> <span class="s">"./data/trainval.json"</span><span class="p">,</span> <span class="s">"./data/images"</span><span class="p">)</span>
</pre>
</div>
<p><span>Each dataset is associated with some metadata. In our case, it is accessible by calling </span><code><span>fruits_nuts_metadata = MetadataCatalog.get(</span><span>"fruits_nuts"</span><span>)</span></code>, you will get</p>
<div class="highlight">
<pre><span class="n">Metadata</span><span class="p">(</span><span class="n">evaluator_type</span><span class="o">=</span><span class="s">'coco'</span><span class="p">,</span> <span class="n">image_root</span><span class="o">=</span><span class="s">'./data/images'</span><span class="p">,</span> <span class="n">json_file</span><span class="o">=</span><span class="s">'./data/trainval.json'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'fruits_nuts'</span><span class="p">,</span>
<span class="n">thing_classes</span><span class="o">=</span><span class="p">[</span><span class="s">'date'</span><span class="p">,</span> <span class="s">'fig'</span><span class="p">,</span> <span class="s">'hazelnut'</span><span class="p">],</span> <span class="n">thing_dataset_id_to_contiguous_id</span><span class="o">=</span><span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">:</span> <span class="mi">2</span><span class="p">})</span>
</pre>
</div>
<p>To get the actual internal representation of the catalog stores information about the datasets and how to obtain them, you can call <code>dataset_dicts = DatasetCatalog.get("fruits_nuts")</code>. The internal f<span>ormat uses one dict to represent the annotations of one image.</span></p>
<p><span>To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the dataset:</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">detectron2.utils.visualizer</span> <span class="kn">import</span> <span class="n">Visualizer</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">random</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">dataset_dicts</span><span class="p">,</span> <span class="mi">3</span><span class="p">):</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">imread</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s">"file_name"</span><span class="p">])</span>
<span class="n">visualizer</span> <span class="o">=</span> <span class="n">Visualizer</span><span class="p">(</span><span class="n">img</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">metadata</span><span class="o">=</span><span class="n">fruits_nuts_metadata</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
<span class="n">vis</span> <span class="o">=</span> <span class="n">visualizer</span><span class="o">.</span><span class="n">draw_dataset_dict</span><span class="p">(</span><span class="n">d</span><span class="p">)</span>
<span class="n">cv2_imshow</span><span class="p">(</span><span class="n">vis</span><span class="o">.</span><span class="n">get_image</span><span class="p">()[:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
</pre>
</div>
<p><span>One of the images might show this.</span></p>
<p><span><img alt="vis_annotation" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/vis_annotation.png"/></span></p>
<h2>Train the model</h2>
<p><span>Now, let's fine-tune a coco-pretrained R50-FPN Mask R-CNN model on the fruits_nuts dataset. It takes ~6 minutes to train 300 iterations on Colab's K80 GPU.</span></p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">detectron2.engine</span> <span class="kn">import</span> <span class="n">DefaultTrainer</span>
<span class="kn">from</span> <span class="nn">detectron2.config</span> <span class="kn">import</span> <span class="n">get_cfg</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="n">cfg</span> <span class="o">=</span> <span class="n">get_cfg</span><span class="p">()</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">merge_from_file</span><span class="p">(</span>
<span class="s">"./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"</span>
<span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DATASETS</span><span class="o">.</span><span class="n">TRAIN</span> <span class="o">=</span> <span class="p">(</span><span class="s">"fruits_nuts"</span><span class="p">,)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DATASETS</span><span class="o">.</span><span class="n">TEST</span> <span class="o">=</span> <span class="p">()</span> <span class="c1"># no metrics implemented for this dataset</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DATALOADER</span><span class="o">.</span><span class="n">NUM_WORKERS</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MODEL</span><span class="o">.</span><span class="n">WEIGHTS</span> <span class="o">=</span> <span class="s">"detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"</span> <span class="c1"># initialize from model zoo</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">SOLVER</span><span class="o">.</span><span class="n">IMS_PER_BATCH</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">SOLVER</span><span class="o">.</span><span class="n">BASE_LR</span> <span class="o">=</span> <span class="mf">0.02</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">SOLVER</span><span class="o">.</span><span class="n">MAX_ITER</span> <span class="o">=</span> <span class="p">(</span>
<span class="mi">300</span>
<span class="p">)</span> <span class="c1"># 300 iterations seems good enough, but you can certainly train longer</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MODEL</span><span class="o">.</span><span class="n">ROI_HEADS</span><span class="o">.</span><span class="n">BATCH_SIZE_PER_IMAGE</span> <span class="o">=</span> <span class="p">(</span>
<span class="mi">128</span>
<span class="p">)</span> <span class="c1"># faster, and good enough for this toy dataset</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MODEL</span><span class="o">.</span><span class="n">ROI_HEADS</span><span class="o">.</span><span class="n">NUM_CLASSES</span> <span class="o">=</span> <span class="mi">3</span> <span class="c1"># 3 classes (data, fig, hazelnut)</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">OUTPUT_DIR</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">trainer</span> <span class="o">=</span> <span class="n">DefaultTrainer</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span>
<span class="n">trainer</span><span class="o">.</span><span class="n">resume_or_load</span><span class="p">(</span><span class="n">resume</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">trainer</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
</pre>
</div>
<p>In case you switch to your own datasets, change the number of classes, learning rate, or max iterations accordingly.</p>
<p><img alt="train" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/train.png"/></p>
<h2>Make a prediction</h2>
<p><span>Now, we perform inference with the trained model on the fruits_nuts dataset. First, let's create a predictor using the model we just trained:</span></p>
<div class="highlight">
<pre><span class="n">cfg</span><span class="o">.</span><span class="n">MODEL</span><span class="o">.</span><span class="n">WEIGHTS</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">cfg</span><span class="o">.</span><span class="n">OUTPUT_DIR</span><span class="p">,</span> <span class="s">"model_final.pth"</span><span class="p">)</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">MODEL</span><span class="o">.</span><span class="n">ROI_HEADS</span><span class="o">.</span><span class="n">SCORE_THRESH_TEST</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="c1"># set the testing threshold for this model</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DATASETS</span><span class="o">.</span><span class="n">TEST</span> <span class="o">=</span> <span class="p">(</span><span class="s">"fruits_nuts"</span><span class="p">,</span> <span class="p">)</span>
<span class="n">predictor</span> <span class="o">=</span> <span class="n">DefaultPredictor</span><span class="p">(</span><span class="n">cfg</span><span class="p">)</span>
</pre>
</div>
<p><span>Then, we randomly select several samples to visualize the prediction results.</span></p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">detectron2.utils.visualizer</span> <span class="kn">import</span> <span class="n">ColorMode</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">random</span><span class="o">.</span><span class="n">sample</span><span class="p">(</span><span class="n">dataset_dicts</span><span class="p">,</span> <span class="mi">3</span><span class="p">):</span>
<span class="n">im</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">imread</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s">"file_name"</span><span class="p">])</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">predictor</span><span class="p">(</span><span class="n">im</span><span class="p">)</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">Visualizer</span><span class="p">(</span><span class="n">im</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span>
<span class="n">metadata</span><span class="o">=</span><span class="n">fruits_nuts_metadata</span><span class="p">,</span>
<span class="n">scale</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span>
<span class="n">instance_mode</span><span class="o">=</span><span class="n">ColorMode</span><span class="o">.</span><span class="n">IMAGE_BW</span> <span class="c1"># remove the colors of unsegmented pixels</span>
<span class="p">)</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">draw_instance_predictions</span><span class="p">(</span><span class="n">outputs</span><span class="p">[</span><span class="s">"instances"</span><span class="p">]</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s">"cpu"</span><span class="p">))</span>
<span class="n">cv2_imshow</span><span class="p">(</span><span class="n">v</span><span class="o">.</span><span class="n">get_image</span><span class="p">()[:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
</pre>
</div>
<p>Here is what you get with a sample image with prediction overlayed.</p>
<p><img alt="prediction" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/dc36080eb69ef185009043cc4229fed36f91b188/images/detectron2/prediction.png"/></p>
<h2>Conclusion and further thought</h2>
<p>You might have read <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/">my previous tutorial</a> on a similar object detection framework named MMdetection also built upon PyTorch. So how is Detectron2 compared with it? Here are my few thoughts.</p>
<p>Both frameworks are easy to config with a config file that describes how you want to train a model. Detectron2's YAML config files are more efficient for two reasons. First, You can reuse configs by making a "base" config first and build final training config files upon this base config file which reduces duplicated code. Second, the config file can be loaded first and allows any further modification as necessary in Python code which makes it more flexible.</p>
<p>What about the inference speed? Simply put, Detectron2 is slightly faster than MMdetection for the same Mask RCNN Resnet50 FPN model. MMdetection gets 2.45 FPS while Detectron2 achieves 2.59 FPS, or a 5.7% speed boost on inferencing a single image. Benchmark based on the following code.</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">time</span>
<span class="n">times</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">predictor</span><span class="p">(</span><span class="n">im</span><span class="p">)</span>
<span class="n">delta</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span>
<span class="n">times</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">delta</span><span class="p">)</span>
<span class="n">mean_delta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">times</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">fps</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">mean_delta</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Average(sec):{:.2f},fps:{:.2f}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">mean_delta</span><span class="p">,</span> <span class="n">fps</span><span class="p">))</span>
</pre>
</div>
<p>So, you have it, Detectron2 make it super simple for you to train a custom instance segmentation model with custom datasets. You might find the following resources helpful.</p>
<p>My previous post <span>- </span><a href="https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-instance-segmentation/">How to create custom COCO data set for instance segmentation</a><span>.</span></p>
<p><span>My previous post - <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/">How to train an object detection model with mmdetection</a>.</span></p>
<p><span><a href="https://github.com/facebookresearch/detectron2">Detectron2 GitHub repository</a>.</span></p>
<p><span>The runnable <a href="https://colab.research.google.com/github/Tony607/detectron2_instance_segmentation_demo/blob/master/Detectron2_custom_coco_data_segmentation.ipynb">Colab Notebook</a> for this post.</span></p>Getting started with VS CODE remote development2019-09-22T05:02:08+00:002024-03-19T09:43:09+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/getting-started-with-vscode-remote-development/<p><img alt="remote_dev" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/remote_dev.png"/></p>
<p>Let's say you have a GPU virtual instance on the cloud or a physical machine which is headless, there are several options like remote desktop or Jupyter Notebook which can provide you with desktop-like development experience, however, VS CODE remote development extension can be more flexible than Jupyter notebook and more responsive than remote desktop. I will show you step by step how to set up it up on Windows.</p>
<h2>Start OpenSSH service</h2>
<p>First, let's make sure you have set up SSH on your server, most likely your online server instance will have OpenSSH server preconfigured, the command below can check whether it is running.</p>
<pre>service sshd status</pre>
<p>If you see something like this, you are good to go, otherwise, install or start the OpenSSH server</p>
<pre>● ssh.service - OpenBSD Secure Shell server<br/> Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)<br/> Active: active (running) since Tue 2019-09-17 19:58:43 CST; 4 days ago<br/> Main PID: 600 (sshd)<br/> Tasks: 1 (limit: 1109)<br/> CGroup: /system.slice/ssh.service<br/> └─600 /usr/sbin/sshd -D</pre>
<p>For the Ubuntu system, you can install OpenSSH server and optionally change the default 22 port like this</p>
<div class="highlight">
<pre><span class="n">sudo</span> <span class="n">apt</span><span class="o">-</span><span class="n">get</span> <span class="n">install</span> <span class="n">openssh</span><span class="o">-</span><span class="n">server</span>
<span class="c1"># Optionally change the SSH port inside this file.</span>
<span class="n">sudo</span> <span class="n">vi</span> <span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">ssh</span><span class="o">/</span><span class="n">sshd_config</span>
<span class="n">sudo</span> <span class="n">systemctl</span> <span class="n">restart</span> <span class="n">ssh</span>
</pre>
</div>
<p>Once you have set it up, ssh to this server from your development machine with IP address, user name and password just to verify there are no glitches.</p>
<h2>OpenSSH client on Windows</h2>
<p>This step is painless, for Windows 10 users, it is just enabling a feature in the setting page, it might be enabled already. Anyway, here is the step to verify this feature is enabled.</p>
<p>In the Settings page, go to Apps, then click "Manage optional features", scroll down and check "OpenSSH Client" is installed.</p>
<p><img alt="openssh1" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/openssh1.png"/></p>
<p><img alt="openssh2" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/openssh2.png"/></p>
<p><img alt="openssh3" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/openssh3.png"/></p>
<p></p>
<p><span></span></p>
<h2><span>Setup SSH keys</span></h2>
<p><span>You don't want to type your user name and password every time when you log in to the server, do you?</span></p>
<h4><span>In Windows(your development machine)</span></h4>
<p><span>Here we will generate an SSH key like this in a command prompt,</span></p>
<pre><span>ssh-keygen -t rsa</span></pre>
<p><span>Accept the defaults, you can leave the key phase empty when following along the prompt.</span></p>
<p>Copy the output of this command,</p>
<pre><span>cat ~/.ssh/id_rsa.pub</span></pre>
<p>Then ssh to the server with user name and password if you haven't already, then run those following command lines to open up append the content you just copied to <code>~/.ssh/authorized_keys</code> on the server.</p>
<div class="highlight">
<pre><span class="n">mkdir</span> <span class="o">-</span><span class="n">p</span> <span class="o">~/.</span><span class="n">ssh</span>
<span class="n">vi</span> <span class="o">~/.</span><span class="n">ssh</span><span class="o">/</span><span class="n">authorized_keys</span>
</pre>
</div>
<p><em>In case you are not familiar with vi, "Shift+END" goes to the end, type "a" to enter append mode, right-click to paste the content of the clipboard. Once you are done, press "Shift + ;" then type "wq" to write and quite. Hopefully, we don't need to edit our code the same way in vi anymore after this.</em></p>
<p><em></em>To verify the SSH is set up, on your Windows machine start a new command line prompt and type <code>ssh <username>@<server ip></code>, it should log in automatically without asking for the password.</p>
<h2>Install Remote Development VS CODE Extension</h2>
<p>Open the VSCOD, click the Extension tab, then search for "remote development" and install it.</p>
<p><img alt="install_extension" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/install_extension.png"/></p>
<p>Once it is installed, you will see a new tab named "Remote Explorer", click on it and the gear button.</p>
<p><img alt="remote_explorer" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4c335fa0551812e0dde537eff9ba72404361776b/images/vscode/remote_explorer.png"/></p>
<p>Choose the first entry, in my case, it is like <code>C:\Users\hasee\.ssh\config</code>, Once you have it open, fill in the alias, hostname, and user. the alias can be any text which helps you remember, the hostname is likely the IP address of the remote machine.</p>
<p>Once you have this done, just click on the "Connect to Host in New Window" button.</p>
<p>One last step, in the new window click "Open Folder" in the sidebar to select a folder path on your remote machine and you are good to go, type "Ctrl + `" to open the terminal on the remote machine just like doing it locally.</p>
<h2>Conclusion and further reading</h2>
<p>Now you have it, a quick tutorial showing you how to setup VS CODE remote development from scratch allowing you to enjoy a desktop development experience on a headless remote server.</p>
<p>For the <a href="https://code.visualstudio.com/docs/remote/remote-overview">official VS Code Remote Development page</a>, please refer to the website.</p>Recent Advances in Deep Learning for Object Detection - Part 22019-09-03T12:14:39+00:002024-03-18T19:47:35+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/recent-advances-in-deep-learning-for-object-detection-part-2/<p><img alt="advance2" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/c9ce73d8a1b059c644f99d96e0cfee8f2a394fe2/images/object-detection/advance2.png"/></p>
<p>In the second part of the Recent Advances in Deep Learning for Object Detection series, we will summarize three aspects of object detection, <span>proposal generation, feature representation learning, and learning strategy.</span></p>
<h2>Proposal Generation</h2>
<p>A proposal generator generates a set of rectangle bounding boxes, which are potential objects.</p>
<p><img alt="rpn" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/c20d707f2e73d049f4a032cb38f78a90c4ea9f68/images/object-detection/rpn.png"/></p>
<p><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 580.434 1441.78" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.16666em}
.st3 {font-size:1em}
.st4 {fill:#198742;font-family:Calibri;font-size:1.16666em}
.st5 {font-family:Microsoft Sans Serif;font-size:1em}
.st6 {fill:#ff0000;font-size:1em}
.st7 {fill:#000000;font-size:1em}
.st8 {fill:#00882b;font-family:Calibri;font-size:1.16666em}
.st9 {fill:#ff0000;font-family:Calibri;font-size:1.16666em}
.st10 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="40" v:mid="119">
<title>adv4-Proposal Generation</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape2-1" transform="translate(18.375,-1202)" v:groupcontext="shape" v:mid="2">
<title>工作表.2</title>
<desc>Traditional Computer Vision Methods</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="101.846" cy="1331.08" height="221.411" width="203.7"></v:textrect> <rect height="221.411" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="203.693" x="0" y="1220.37"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="21.98" y="1326.88"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Traditional Computer Vision <tspan dy="1.2em" style="font-size: 1em;" x="76.25">Methods</tspan></text> </g> <g id="shape3-5" transform="translate(222.288,-1202)" v:groupcontext="shape" v:mid="3">
<title>工作表.3</title>
<desc>i) computing the ’objectness score’ of a candidate box; ii) m...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1331.08" height="221.411" width="339.78"></v:textrect> <rect height="221.411" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1220.37"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1251.17"><v:paragraph></v:paragraph><v:tablist></v:tablist>i) computing the ’objectness score’<tspan style="font-family: Microsoft Sans Serif; font-size: 1em;" v:langid="2052"> </tspan>of a candidate box; ii) <tspan dy="1.208em" style="font-size: 1em;" x="4">merging super</tspan>-pixels from original images; iii) generating <tspan dy="1.223em" style="font-size: 1em;" x="4">multiple foreground and background segments;<v:newlinechar></v:newlinechar></tspan><tspan dy="1.208em" style="font-size: 1em;" x="4">primary advantage of these traditional computer vision </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">methods is that they are very simple and can generate </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">proposals with high recall.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">Mainly based on low-level visual cues such as color or </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">edges. They cannot be jointly optimized with the whole </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">detection pipeline. Thus they are unable to exploit the </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">power of large scale datasets to improve representation </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">learning. </tspan> </text> </g> <g id="shape4-19" transform="translate(18.375,-540.554)" v:groupcontext="shape" v:mid="4">
<title>工作表.4</title>
<desc>Anchor-based Methods</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="39.1127" cy="1111.14" height="661.284" width="78.23"></v:textrect> <rect height="661.284" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="78.2255" x="0" y="780.5"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="16.47" y="1098.54"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Anchor-<tspan dy="1.2em" style="font-size: 1em;" x="22.18">based </tspan><tspan dy="1.2em" style="font-size: 1em;" x="13.52">Methods</tspan></text> </g> <g id="shape5-24" transform="translate(96.6005,-1073.48)" v:groupcontext="shape" v:mid="5">
<title>工作表.5</title>
<desc>Region Proposal Network (RPN)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7337" cy="1377.53" height="128.513" width="125.47"></v:textrect> <rect height="128.513" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.467" x="0" y="1313.27"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="16.78" y="1373.33"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Region Proposal <tspan dy="1.2em" style="font-size: 1em;" x="20.31">Network (RPN) </tspan> </text> </g> <g id="shape6-28" transform="translate(96.6005,-1013.23)" v:groupcontext="shape" v:mid="6">
<title>工作表.6</title>
<desc>SSD - multi-scale anchors</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7337" cy="1411.66" height="60.2533" width="125.47"></v:textrect> <rect height="60.2533" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.467" x="0" y="1381.53"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="15.58" y="1407.46"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>SSD - multi-scale <tspan dy="1.2em" style="font-size: 1em;" x="40.2">anchors</tspan></text> </g> <g id="shape7-32" transform="translate(222.288,-1073.48)" v:groupcontext="shape" v:mid="7">
<title>工作表.7</title>
<desc>256−dimensional feature vector was extracted from each anchor...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="130.399" cy="1377.53" height="128.513" width="260.8"></v:textrect> <rect height="128.513" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="260.798" x="0" y="1313.27"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1322.93"><v:paragraph></v:paragraph><v:tablist></v:tablist>256−dimensional feature vector was <tspan dy="1.2em" style="font-size: 1em;" x="4">extracted from each anchor and was fed </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">into two sibling branches </tspan>- classification <tspan dy="1.2em" style="font-size: 1em;" x="4">layer and regression layer. First evaluated </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">whether the anchor proposal was </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">foreground or background and performed </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">the categorical classification in the next </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">stage.</tspan></text> </g> <g id="shape8-42" transform="translate(222.288,-1013.44)" v:groupcontext="shape" v:mid="8">
<title>工作表.8</title>
<desc>Assigned categorical probabilities to each anchor proposal.</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="130.399" cy="1411.76" height="60.0437" width="260.8"></v:textrect> <rect height="60.0437" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="260.798" x="0" y="1381.74"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1407.56"><v:paragraph></v:paragraph><v:tablist></v:tablist>Assigned categorical probabilities to each <tspan dy="1.2em" style="font-size: 1em;" x="4">anchor proposal.</tspan></text> </g> <g id="shape10-46" transform="translate(96.6005,-916.368)" v:groupcontext="shape" v:mid="10">
<title>工作表.10</title>
<desc>Single Shot Scaleinvariant Face Detector (S3FD)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1392.34" height="98.8946" width="125.41"></v:textrect> <rect height="98.8946" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1342.89"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="31.31" y="1379.74"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Single Shot <tspan dy="1.2em" style="font-size: 1em;" x="8.27">Scaleinvariant Face </tspan><tspan dy="1.2em" style="font-size: 1em;" x="17.54">Detector (S3FD) </tspan> </text> </g> <g id="shape11-51" transform="translate(222.288,-916.639)" v:groupcontext="shape" v:mid="11">
<title>工作表.11</title>
<desc>Based on SSD with carefully designed anchors to match the obj...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1392.47" height="98.6237" width="339.78"></v:textrect> <rect height="98.6237" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1343.16"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1371.47"><v:paragraph></v:paragraph><v:tablist></v:tablist>Based on SSD with carefully designed anchors to match <tspan dy="1.2em" style="font-size: 1em;" x="4">the objects. According to the effective receptive field of </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">different feature maps, different anchor priors were </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">designed.</tspan></text> </g> <g id="shape12-57" transform="translate(96.6005,-782.744)" v:groupcontext="shape" v:mid="12">
<title>工作表.12</title>
<desc>Dimension-Decomposition Region Proposal Network (DeRPN)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1374.47" height="134.621" width="125.41"></v:textrect> <rect height="134.621" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1307.16"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="30.18" y="1353.47"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Dimension-<tspan dy="1.2em" style="font-size: 1em;" x="19.64">Decomposition </tspan><tspan dy="1.2em" style="font-size: 1em;" x="16.75">Region Proposal </tspan><tspan dy="1.2em" style="font-size: 1em;" x="12.49">Network (DeRPN) </tspan> </text> </g> <g id="shape13-63" transform="translate(222.288,-782.778)" v:groupcontext="shape" v:mid="13">
<title>工作表.13</title>
<desc>Used an anchor string mechanism to independently match object...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1374.49" height="134.586" width="339.78"></v:textrect> <rect height="134.586" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1307.2"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1353.49"><v:paragraph></v:paragraph><v:tablist></v:tablist>Used an anchor string mechanism to independently <tspan dy="1.2em" style="font-size: 1em;" x="4">match objects width and height. This helped match </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">objects with large scale variance and reduced the </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">searching space.</tspan></text> </g> <g id="shape15-69" transform="translate(96.6004,-683.883)" v:groupcontext="shape" v:mid="15">
<title>工作表.15</title>
<desc>Single-Shot Refinement Neural Network (RefineDet)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1392.34" height="98.8946" width="125.41"></v:textrect> <rect height="98.8946" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1342.89"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="32.33" y="1371.34"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Single-Shot <tspan dy="1.2em" style="font-size: 1em;" x="8.76">Refinement Neural </tspan><tspan dy="1.2em" style="font-size: 1em;" x="38.04">Network </tspan><tspan dy="1.2em" style="font-size: 1em;" x="30.14">(RefineDet)</tspan></text> </g> <g id="shape16-75" transform="translate(222.288,-684.154)" v:groupcontext="shape" v:mid="16">
<title>工作表.16</title>
<desc>Refined the manually defined anchors in two steps. Significan...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1392.47" height="98.6237" width="339.78"></v:textrect> <rect height="98.6237" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1343.16"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1379.87"><v:paragraph></v:paragraph><v:tablist></v:tablist>Refined the manually defined anchors in two steps. <tspan dy="1.2em" style="font-size: 1em;" x="4">Significantly improved the anchor quality and final </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">prediction accuracy in a data</tspan>-driven manner.</text> </g> <g id="shape17-80" transform="translate(96.6004,-638.533)" v:groupcontext="shape" v:mid="17">
<title>工作表.17</title>
<desc>Cascade R-CNN</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1419.06" height="45.447" width="125.41"></v:textrect> <rect height="45.447" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1396.34"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="19.11" y="1423.26"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Cascade R-CNN</text> </g> <g id="shape18-83" transform="translate(222.288,-638.707)" v:groupcontext="shape" v:mid="18">
<title>工作表.18</title>
<desc>Refining proposals in a cascaded way.</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1419.06" height="45.447" width="339.78"></v:textrect> <rect height="45.447" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1396.34"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1423.26"><v:paragraph></v:paragraph><v:tablist></v:tablist>Refining proposals in a cascaded way.</text> </g> <g id="shape19-86" transform="translate(96.6004,-540.554)" v:groupcontext="shape" v:mid="19">
<title>工作表.19</title>
<desc>MetaAnchor</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1391.89" height="99.7836" width="125.41"></v:textrect> <rect height="99.7836" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1342"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="27.04" y="1396.09"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>MetaAnchor</text> </g> <g id="shape20-89" transform="translate(222.288,-540.554)" v:groupcontext="shape" v:mid="20">
<title>工作表.20</title>
<desc>Improvement compared to other manually defined methods but th...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1391.89" height="99.7836" width="339.78"></v:textrect> <rect height="99.7836" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1342"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1379.29"><v:paragraph></v:paragraph>The improvement compared to other manually defined <tspan dy="1.2em" style="font-size: 1em;" x="4">methods</tspan><tspan style="fill: #000000; font-size: 1em;"> </tspan><tspan style="fill: #ff0000; font-size: 1em;">but the customized anchors were still designed </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">manually.</tspan></text> </g> <g id="shape21-96" transform="translate(18.3757,-132.689)" v:groupcontext="shape" v:mid="21">
<title>工作表.21</title>
<desc>Keypoints-based Methods</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="39.1124" cy="1237.85" height="407.865" width="78.23"></v:textrect> <rect height="407.865" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="78.2247" x="0" y="1033.92"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="8.95" y="1225.25"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Keypoints-<tspan dy="1.2em" style="font-size: 1em;" x="22.18">based </tspan><tspan dy="1.2em" style="font-size: 1em;" x="13.52">Methods</tspan></text> </g> <g id="shape22-101" transform="translate(96.6005,-426.72)" v:groupcontext="shape" v:mid="22">
<title>工作表.22</title>
<desc>Denet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1384.87" height="113.834" width="125.41"></v:textrect> <rect height="113.834" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1327.95"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="45.41" y="1389.07"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Denet </text> </g> <g id="shape23-104" transform="translate(222.288,-426.72)" v:groupcontext="shape" v:mid="23">
<title>工作表.23</title>
<desc>Modeled the distribution of being one of the 4 corner types o...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1384.87" height="113.834" width="339.78"></v:textrect> <rect height="113.834" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1327.95"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1363.87"><v:paragraph></v:paragraph><v:tablist></v:tablist>Modeled the distribution of being one of the 4 corner <tspan dy="1.2em" style="font-size: 1em;" x="4">types of objects. This corner</tspan>-based algorithm eliminated <tspan dy="1.2em" style="font-size: 1em;" x="4">the design of anchors and became a more effective </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">method to produce high-quality proposals. </tspan> </text> </g> <g id="shape24-110" transform="translate(96.6005,-312.886)" v:groupcontext="shape" v:mid="24">
<title>工作表.24</title>
<desc>CornerNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1384.87" height="113.834" width="125.41"></v:textrect> <rect height="113.834" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1327.95"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="32.89" y="1389.07"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>CornerNet</text> </g> <g id="shape25-113" transform="translate(222.288,-312.886)" v:groupcontext="shape" v:mid="25">
<title>工作表.25</title>
<desc>Modeled information of top-left and bottom-right corners. Nov...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1384.87" height="113.834" width="339.78"></v:textrect> <rect height="113.834" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1327.95"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1355.47"><v:paragraph></v:paragraph><v:tablist></v:tablist>Modeled information of top-left and bottom-right <tspan dy="1.2em" style="font-size: 1em;" x="4">corners. Novel feature embedding methods and corner </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">pooling layer to correctly match keypoints belonging to </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">the same objects, obtaining state</tspan>-of-the-art results on <tspan dy="1.2em" style="font-size: 1em;" x="4">public benchmarks.</tspan></text> </g> <g id="shape28-120" transform="translate(96.6005,-241.788)" v:groupcontext="shape" v:mid="28">
<title>工作表.28</title>
<desc>feature-selection-anchor-free (FSAF)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="63.0826" cy="1406.23" height="71.0973" width="126.17"></v:textrect> <rect height="71.0973" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="126.165" x="0" y="1370.69"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="12.29" y="1402.04"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>feature-selection-<tspan dy="1.2em" style="font-size: 1em;" x="10.07">anchor</tspan>-free (FSAF) </text> </g> <g id="shape29-124" transform="translate(96.6005,-132.691)" v:groupcontext="shape" v:mid="29">
<title>工作表.29</title>
<desc>CenterNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1387.23" height="109.097" width="125.41"></v:textrect> <rect height="109.097" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1332.69"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="34.78" y="1391.44"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>CenterNet</text> </g> <g id="shape30-127" transform="translate(222.288,-132.691)" v:groupcontext="shape" v:mid="30">
<title>工作表.30</title>
<desc>Combined the idea of center-based methods and corner-based me...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1387.23" height="109.097" width="339.78"></v:textrect> <rect height="109.097" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1332.69"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1366.24"><v:paragraph></v:paragraph><v:tablist></v:tablist>Combined the idea of center-based methods and corner-<tspan dy="1.2em" style="font-size: 1em;" x="4">based methods. First predicted bounding boxes by pairs </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">of corners, and then predicted center probabilities of the </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">initial prediction to reject easy negatives. </tspan> </text> </g> <g id="shape31-133" transform="translate(222.288,-241.788)" v:groupcontext="shape" v:mid="31">
<title>工作表.31</title>
<desc>An online feature selection block is applied to train multile...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1406.23" height="71.0973" width="339.78"></v:textrect> <rect height="71.0973" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1370.69"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1393.64"><v:paragraph></v:paragraph><v:tablist></v:tablist>An online feature selection block is applied to train <tspan dy="1.2em" style="font-size: 1em;" x="4">multilevel center</tspan>-based branches attached in each level of <tspan dy="1.2em" style="font-size: 1em;" x="4">the feature pyramid. </tspan> </text> </g> <g id="shape32-138" transform="translate(96.6005,-18.375)" v:groupcontext="shape" v:mid="32">
<title>工作表.32</title>
<desc>AZnet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="62.7045" cy="1384.63" height="114.316" width="125.41"></v:textrect> <rect height="114.316" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="125.409" x="0" y="1327.47"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="47.45" y="1388.83"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>AZnet</text> </g> <g id="shape33-141" transform="translate(222.288,-18.375)" v:groupcontext="shape" v:mid="33">
<title>工作表.33</title>
<desc>Predicted two values: zoom indicator and adjacency scores. Zo...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="169.886" cy="1384.63" height="114.316" width="339.78"></v:textrect> <rect height="114.316" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="339.771" x="0" y="1327.47"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1346.83"><v:paragraph></v:paragraph><v:tablist></v:tablist>Predicted two values: zoom indicator and adjacency <tspan dy="1.2em" style="font-size: 1em;" x="4">scores. Zoom indicator determined whether to further </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">divide this region which may contain smaller objects and </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">adjacency scores denoted its objectness. Better at </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">matching sparse and small objects compared to RPN</tspan>’s <tspan dy="1.2em" style="font-size: 1em;" x="4">anchor</tspan>-object matching approach.</text> </g> <g id="shape77-149" transform="translate(482.847,-1015.26)" v:groupcontext="shape" v:mid="77">
<title>工作表.77</title>
<desc>The anchor priors are manually designed with multiple scales ...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="39.6061" cy="1348.42" height="186.735" width="79.22"></v:textrect> <rect height="186.735" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="79.2122" x="0" y="1255.05"></rect> <text style="fill: #ff0000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="1268.62"><v:paragraph></v:paragraph><v:tablist></v:tablist>The anchor <tspan dy="1.2em" style="font-size: 1em;" x="4">priors are </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">manually </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">designed </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">with </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">multiple </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">scales and </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">aspect </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">ratios in a </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">heuristic </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">manner. </tspan> </text> </g> <g id="shape103-162" transform="translate(18.3757,-18.375)" v:groupcontext="shape" v:mid="103">
<title>工作表.103</title>
<desc>Other Methods</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="39.1127" cy="1384.63" height="114.316" width="78.23"></v:textrect> <rect height="114.316" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="78.2255" x="0" y="1327.47"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="22.53" y="1380.43"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Other <tspan dy="1.2em" style="font-size: 1em;" x="13.52">Methods</tspan></text> </g> </g> </svg></p>
<h2>Feature Representation Learning</h2>
<p>Three categories: multi-scale feature learning, contextual reasoning, and deformable feature learning.</p>
<p><img alt="multi-scale" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/c20d707f2e73d049f4a032cb38f78a90c4ea9f68/images/object-detection/multi-scale.png"/></p>
<p><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 467.616 830.684" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.00001em}
.st3 {font-size:1em}
.st4 {fill:#000000;font-family:Calibri;font-size:0.833336em}
.st5 {fill:#198742;font-family:Calibri;font-size:0.833336em}
.st6 {fill:#000000;font-size:1em}
.st7 {fill:#ff0000;font-family:Calibri;font-size:0.833336em}
.st8 {fill:#ff0000;font-size:1em}
.st9 {fill:#459b5c;font-family:Calibri;font-size:0.833336em}
.st10 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="41" v:mid="122">
<title>adv5</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape1-1" transform="translate(18.375,-519.98)" v:groupcontext="shape" v:mid="1">
<title>工作表.1</title>
<desc>Multi-scale feature learning</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="34.3588" cy="684.519" height="292.329" width="68.72"></v:textrect> <rect height="292.329" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="68.7176" x="0" y="538.355"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="7.36" y="673.72"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Multi-scale <tspan dy="1.2em" style="font-size: 1em;" x="16.43">feature </tspan><tspan dy="1.2em" style="font-size: 1em;" x="14.52">learning</tspan></text> </g> <g id="shape2-6" transform="translate(18.375,-76.1936)" v:groupcontext="shape" v:mid="2">
<title>工作表.2</title>
<desc>Contextual reasoning</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="34.3588" cy="786.115" height="89.1377" width="68.72"></v:textrect> <rect height="89.1377" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="68.7176" x="0" y="741.546"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="7.84" y="782.51"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Contextual <tspan dy="1.2em" style="font-size: 1em;" x="10.39">reasoning</tspan></text> </g> <g id="shape3-10" transform="translate(18.375,-18.376)" v:groupcontext="shape" v:mid="3">
<title>工作表.3</title>
<desc>deformable feature learning</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="34.3588" cy="801.774" height="57.8192" width="68.72"></v:textrect> <rect height="57.8192" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="68.7176" x="0" y="772.865"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="5.95" y="790.97"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>deformable <tspan dy="1.2em" style="font-size: 1em;" x="16.43">feature </tspan><tspan dy="1.2em" style="font-size: 1em;" x="14.52">learning</tspan></text> </g> <g id="shape4-15" transform="translate(87.093,-739.121)" v:groupcontext="shape" v:mid="4">
<title>工作表.4</title>
<desc>Image Pyramid</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="794.09" height="73.1882" width="77.95"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="23.99" y="790.49"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Image <tspan dy="1.2em" style="font-size: 1em;" x="18.87">Pyramid</tspan></text> </g> <g id="shape5-19" transform="translate(165.042,-739.121)" v:groupcontext="shape" v:mid="5">
<title>工作表.5</title>
<desc>Resize input images into a number of different scales (Image ...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="794.09" height="73.1882" width="284.2"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="779.09"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Resize input images into a number of different scales (Image <tspan dy="1.2em" style="font-size: 1em;" x="4">Pyramid) and to train multiple detectors. Each of which is </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">responsible for a certain range of scales. Examples: Scale </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Normalization for Image Pyramids (SNIP).</tspan></text> </g> <g id="shape6-25" transform="translate(87.093,-665.775)" v:groupcontext="shape" v:mid="6">
<title>工作表.6</title>
<desc>Integrated Features</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="794.09" height="73.1882" width="77.95"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="13.38" y="790.49"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Integrated <tspan dy="1.2em" style="font-size: 1em;" x="17.77">Features</tspan></text> </g> <g id="shape7-29" transform="translate(165.042,-665.775)" v:groupcontext="shape" v:mid="7">
<title>工作表.7</title>
<desc>Construct a single feature map by combining features in multi...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="794.09" height="73.1882" width="284.2"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="779.09"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Construct a single feature map by combining features in multiple <tspan dy="1.2em" style="font-size: 1em;" x="4">layers and making final predictions based on the newly constructed </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">map. Examples: Inside</tspan>-Outside Network (ION), HyperNet, Multi-<tspan dy="1.2em" style="font-size: 1em;" x="4">scale Location</tspan>-aware Kernel Representation (MLKP).</text> </g> <g id="shape8-35" transform="translate(87.093,-593.24)" v:groupcontext="shape" v:mid="8">
<title>工作表.8</title>
<desc>Prediction Pyramid</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="794.09" height="73.1882" width="77.95"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="14.03" y="790.49"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Prediction <tspan dy="1.2em" style="font-size: 1em;" x="18.87">Pyramid</tspan></text> </g> <g id="shape9-39" transform="translate(165.042,-593.24)" v:groupcontext="shape" v:mid="9">
<title>工作表.9</title>
<desc>Predictions were made from multiple layers, where each layer ...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="794.09" height="73.1882" width="284.2"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="785.09"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Predictions were made from multiple layers, where each layer was <tspan dy="1.2em" style="font-size: 1em;" x="4">responsible for a certain scale of objects. Examples: SSD, Receptive </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Field Block Net (RFBNet).</tspan></text> </g> <g id="shape10-44" transform="translate(87.093,-520.052)" v:groupcontext="shape" v:mid="10">
<title>工作表.10</title>
<desc>Feature Pyramid</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="794.09" height="73.1882" width="77.95"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="757.496"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="20.12" y="790.49"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Feature <tspan dy="1.2em" style="font-size: 1em;" x="18.87">Pyramid</tspan></text> </g> <g id="shape11-48" transform="translate(165.042,-520.052)" v:groupcontext="shape" v:mid="11">
<title>工作表.11</title>
<desc>Combine the advantage of Integrated Features and Prediction P...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="794.09" height="73.1882" width="284.2"></v:textrect> <rect height="73.1882" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="757.496"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="791.09"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Combine the advantage of Integrated Features and Prediction <tspan dy="1.2em" style="font-size: 1em;" x="4">Pyramid.</tspan><tspan style="fill: #000000; font-size: 1em;"> </tspan><tspan style="fill: #000000; font-size: 1em;">Example: Feature Pyramid Network(FPN).</tspan></text> </g> <g id="shape12-54" transform="translate(18.375,-165.331)" v:groupcontext="shape" v:mid="12">
<title>工作表.12</title>
<desc>Region Feature Encoding</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="34.3588" cy="653.36" height="354.648" width="68.72"></v:textrect> <rect height="354.648" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="68.7176" x="0" y="476.036"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="17.6" y="642.56"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Region <tspan dy="1.2em" style="font-size: 1em;" x="15.5">Feature </tspan><tspan dy="1.2em" style="font-size: 1em;" x="12.07">Encoding</tspan></text> </g> <g id="shape13-59" transform="translate(87.093,-494.393)" v:groupcontext="shape" v:mid="13">
<title>工作表.13</title>
<desc>ROI Pooling</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="817.908" height="25.5513" width="77.95"></v:textrect> <rect height="25.5513" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="805.132"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="12.07" y="821.51"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ROI Pooling</text> </g> <g id="shape14-62" transform="translate(165.042,-494.501)" v:groupcontext="shape" v:mid="14">
<title>工作表.14</title>
<desc>Extracted features from the down-sampled feature map and as a...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="817.908" height="25.5513" width="284.2"></v:textrect> <rect height="25.5513" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="805.132"></rect> <text style="fill: #ff0000; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="814.91"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Extracted features from the down-sampled feature map and as a <tspan dy="1.2em" style="font-size: 1em;" x="4">result struggled to handle small objects. </tspan> </text> </g> <g id="shape15-66" transform="translate(87.093,-444.743)" v:groupcontext="shape" v:mid="15">
<title>工作表.15</title>
<desc>ROI Warping</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="805.727" height="49.9126" width="77.95"></v:textrect> <rect height="49.9126" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="780.771"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="9.42" y="809.33"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ROI Warping</text> </g> <g id="shape16-69" transform="translate(165.042,-444.743)" v:groupcontext="shape" v:mid="16">
<title>工作表.16</title>
<desc>Encoded region features via bilinear interpolation. Due to th...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="805.727" height="49.9126" width="284.2"></v:textrect> <rect height="49.9126" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="780.771"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="790.73"><v:paragraph v:bulletsize="0.166667"></v:paragraph>The encoded region features via bilinear interpolation. <tspan style="fill: #ff0000; font-size: 1em;">Due to the </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">downsampling operation in DCNN, there can be a misalignment of </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">the object position in the original image and the downsampled </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">feature maps.</tspan></text> </g> <g id="shape17-76" transform="translate(87.0933,-403.266)" v:groupcontext="shape" v:mid="17">
<title>工作表.17</title>
<desc>ROI Align</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="809.945" height="41.477" width="77.95"></v:textrect> <rect height="41.477" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="789.207"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="16.67" y="813.55"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ROI Align</text> </g> <g id="shape18-79" transform="translate(165.042,-403.266)" v:groupcontext="shape" v:mid="18">
<title>工作表.18</title>
<desc>Addressed the quantization issue by bilinear interpolation at...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="809.945" height="41.477" width="284.2"></v:textrect> <rect height="41.477" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="789.207"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="806.95"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Addressed the quantization issue by bilinear interpolation at <tspan dy="1.2em" style="font-size: 1em;" x="4">fractionally sampled positions within each grid.</tspan></text> </g> <g id="shape19-83" transform="translate(87.0937,-344.892)" v:groupcontext="shape" v:mid="19">
<title>工作表.19</title>
<desc>Precise ROI Pooing (PrROI Pooling)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="801.497" height="58.3742" width="77.95"></v:textrect> <rect height="58.3742" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="772.31"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="11.45" y="790.7"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Precise ROI <tspan dy="1.2em" style="font-size: 1em;" x="5.08">Pooing (PrROI </tspan><tspan dy="1.2em" style="font-size: 1em;" x="19">Pooling)</tspan></text> </g> <g id="shape20-88" transform="translate(165.043,-344.892)" v:groupcontext="shape" v:mid="20">
<title>工作表.20</title>
<desc>Avoided any quantization of coordinates and had a continuous ...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="801.497" height="58.3742" width="284.2"></v:textrect> <rect height="58.3742" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="772.31"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="798.5"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Avoided any quantization of coordinates and had a continuous <tspan dy="1.2em" style="font-size: 1em;" x="4">gradient on bounding box coordinates.</tspan></text> </g> <g id="shape21-92" transform="translate(87.093,-273.492)" v:groupcontext="shape" v:mid="21">
<title>工作表.21</title>
<desc>Position Sensitive ROI Pooing (PSROI Pooling)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="794.984" height="71.4001" width="77.95"></v:textrect> <rect height="71.4001" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="759.284"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="19.28" y="776.98"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Position <tspan dy="1.2em" style="font-size: 1em;" x="7.18">Sensitive ROI </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4.42">Pooing (PSROI </tspan><tspan dy="1.2em" style="font-size: 1em;" x="19">Pooling)</tspan></text> </g> <g id="shape22-98" transform="translate(165.043,-273.492)" v:groupcontext="shape" v:mid="22">
<title>工作表.22</title>
<desc>Enhance spatial information of the downsampled region features</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="794.984" height="71.4001" width="284.2"></v:textrect> <rect height="71.4001" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="759.284"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="797.98"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Enhance spatial information of the downsampled region features.</text> </g> <g id="shape23-101" transform="translate(87.0932,-218.331)" v:groupcontext="shape" v:mid="23">
<title>工作表.23</title>
<desc>CoupleNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="803.103" height="55.161" width="77.95"></v:textrect> <rect height="55.161" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="775.523"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="13.08" y="806.7"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>CoupleNet</text> </g> <g id="shape24-104" transform="translate(165.043,-218.331)" v:groupcontext="shape" v:mid="24">
<title>工作表.24</title>
<desc>Combining outputs generated from both ROI Pooling layer and P...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="803.103" height="55.161" width="284.2"></v:textrect> <rect height="55.161" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="775.523"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="788.1"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Combining outputs generated from both ROI Pooling layer and <tspan dy="1.2em" style="font-size: 1em;" x="4">PSROI Pooling layer. ROI Pooling layer extracted global region </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">information but struggled for objects with high occlusion while </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">PSROI Pooling layer focused more on local information.</tspan></text> </g> <g id="shape25-110" transform="translate(87.0932,-165.331)" v:groupcontext="shape" v:mid="25">
<title>工作表.25</title>
<desc>Deformable ROI Pooling</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="38.9744" cy="804.184" height="52.9999" width="77.95"></v:textrect> <rect height="52.9999" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="77.9488" x="0" y="777.684"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="11.39" y="800.58"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Deformable <tspan dy="1.2em" style="font-size: 1em;" x="10.72">ROI Pooling</tspan></text> </g> <g id="shape26-114" transform="translate(165.043,-165.331)" v:groupcontext="shape" v:mid="26">
<title>工作表.26</title>
<desc>Can automatically model the image content without being const...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="142.099" cy="804.184" height="52.9999" width="284.2"></v:textrect> <rect height="52.9999" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="284.198" x="0" y="777.684"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="801.18"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Can automatically model the image content without being <tspan dy="1.2em" style="font-size: 1em;" x="4">constrained by fixed receptive fields.</tspan></text> </g> <g id="shape27-118" transform="translate(87.0932,-76.1936)" v:groupcontext="shape" v:mid="27">
<title>工作表.27</title>
<desc>Learning the relationship between objects with their surround...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="181.074" cy="786.115" height="89.1377" width="362.15"></v:textrect> <rect height="89.1377" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="362.147" x="0" y="741.546"></rect> <text style="fill: #459b5c; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="771.12"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Learning the relationship between objects with their surrounding context can improve the <tspan dy="1.2em" style="font-size: 1em;" x="4">detector</tspan>’s ability to understand the scenario.<tspan style="fill: #000000; font-size: 1em;"> </tspan><tspan style="fill: #000000; font-size: 1em;">Two aspects: global context and region </tspan><tspan dy="1.2em" style="fill: #000000; font-size: 1em;" x="4">context. </tspan><tspan style="fill: #000000; font-size: 1em;">Examples: Spatial Memory Network (SMN), Structure Inference Net (SIN), </tspan><tspan dy="1.2em" style="fill: #000000; font-size: 1em;" x="4">Gated Bi</tspan><tspan style="fill: #000000; font-size: 1em;">-</tspan><tspan style="fill: #000000; font-size: 1em;">Directional CNN (GBDNet).</tspan></text> </g> <g id="shape28-129" transform="translate(87.0932,-18.375)" v:groupcontext="shape" v:mid="28">
<title>工作表.28</title>
<desc>Robust to nonrigid deformation of objects. Examples: DeepIDNe...</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="181.074" cy="801.774" height="57.8192" width="362.15"></v:textrect> <rect height="57.8192" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="362.147" x="0" y="772.865"></rect> <text style="fill: #459b5c; font-family: Calibri; font-size: 0.833336em;" v:langid="1033" x="4" y="792.77"><v:paragraph v:bulletsize="0.166667"></v:paragraph><v:tablist></v:tablist>Robust to nonrigid deformation of objects.<tspan style="fill: #000000; font-size: 1em;"> </tspan><tspan style="fill: #000000; font-size: 1em;">Examples: DeepIDNet developed a </tspan><tspan dy="1.2em" style="fill: #000000; font-size: 1em;" x="4">deformable</tspan><tspan style="fill: #000000; font-size: 1em;">-</tspan><tspan style="fill: #000000; font-size: 1em;">aware pooling layer to encode the deformation information across </tspan><tspan dy="1.2em" style="fill: #000000; font-size: 1em;" x="4">different object categories.</tspan></text> </g> </g> </svg></p>
<h2>Learning Strategy</h2>
<p>To tackle imbalance sampling, localization, acceleration, etc.</p>
<p><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 461.946 872.972" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.00001em}
.st3 {font-size:1em}
.st4 {fill:#000000;font-family:Calibri;font-size:1.00001em;font-weight:bold}
.st5 {font-size:1em;font-weight:normal}
.st6 {fill:#ff0000;font-size:1em;font-weight:normal}
.st7 {fill:#198742;font-size:1em;font-weight:normal}
.st8 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="42" v:mid="123">
<title>adv6</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape1-1" transform="translate(18.375,-221.56)" v:groupcontext="shape" v:mid="1">
<title>工作表.1</title>
<desc>Training Stage</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="29.9573" cy="556.453" height="633.037" width="59.92"></v:textrect> <rect height="633.037" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="59.9145" x="0" y="239.935"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="10.18" y="552.85"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Training <tspan dy="1.2em" style="font-size: 1em;" x="16.51">Stage</tspan></text> </g> <g id="shape2-5" transform="translate(78.3197,-759.999)" v:groupcontext="shape" v:mid="2">
<title>工作表.2</title>
<desc>Data Augmentation</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="825.674" height="94.596" width="84.35"></v:textrect> <rect height="94.596" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="778.376"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="30.73" y="822.07"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Data <tspan dy="1.2em" style="font-size: 1em;" x="7.21">Augmentation</tspan></text> </g> <g id="shape3-9" transform="translate(162.703,-759.999)" v:groupcontext="shape" v:mid="3">
<title>工作表.3</title>
<desc>Horizontal flips of training images is used in training Faste...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="825.674" height="94.596" width="280.87"></v:textrect> <rect height="94.596" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="778.376"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="800.47"><v:paragraph></v:paragraph><v:tablist></v:tablist>Horizontal flips of training images are used in training <tspan dy="1.2em" style="font-size: 1em;" x="4">Faster R</tspan>-CNN detector. <v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">A more intensive data augmentation strategy is used in </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">one</tspan>-stage detectors including rotation, random crops, <tspan dy="1.2em" style="font-size: 1em;" x="4">expanding and color jittering.</tspan></text> </g> <g id="shape4-16" transform="translate(78.3197,-607.611)" v:groupcontext="shape" v:mid="4">
<title>工作表.4</title>
<desc>Imbalance Sampling</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="796.777" height="152.389" width="84.35"></v:textrect> <rect height="152.389" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="720.583"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="18.27" y="793.18"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Imbalance <tspan dy="1.2em" style="font-size: 1em;" x="19.87">Sampling</tspan></text> </g> <g id="shape5-20" transform="translate(162.703,-607.611)" v:groupcontext="shape" v:mid="5">
<title>工作表.5</title>
<desc>Hard negative sampling, negative proposals with higher classi...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="796.777" height="152.389" width="280.87"></v:textrect> <rect height="152.389" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="720.583"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em; font-weight: bold;" v:langid="1033" x="4" y="749.98"><v:paragraph></v:paragraph><v:tablist></v:tablist>Hard negative sampling<tspan style="font-size: 1em; font-weight: normal;">, negative proposals with </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">higher classification loss were selected for training.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Focal loss</tspan><tspan style="font-size: 1em; font-weight: normal;">. The gradient signals of easy samples got </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">suppressed which led the training process to focus </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">more on hard proposals.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Gradient harmonizing mechanism (GHM)</tspan><tspan style="font-size: 1em; font-weight: normal;">, not only </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">suppressed easy proposals but also avoided the negative </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">impact of outliers.</tspan></text> </g> <g id="shape6-33" transform="translate(78.3197,-530.401)" v:groupcontext="shape" v:mid="6">
<title>工作表.6</title>
<desc>Localization Refinement</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="834.366" height="77.2103" width="84.35"></v:textrect> <rect height="77.2103" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="795.761"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="13.38" y="830.77"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Localization <tspan dy="1.2em" style="font-size: 1em;" x="13.64">Refinement</tspan></text> </g> <g id="shape7-37" transform="translate(162.703,-530.401)" v:groupcontext="shape" v:mid="7">
<title>工作表.7</title>
<desc>Examples: LocNet, MultiPath Network, FitnessNMS Grid R-CNN re...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="834.366" height="77.2103" width="280.87"></v:textrect> <rect height="77.2103" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="795.761"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="816.37"><v:paragraph></v:paragraph><v:tablist></v:tablist>Examples: LocNet, MultiPath Network, FitnessNMS<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">Grid R</tspan>-CNN replaced linear bounding box regressor <tspan dy="1.2em" style="font-size: 1em;" x="4">with the principle of locating corner keypoints corner</tspan>-<tspan dy="1.2em" style="font-size: 1em;" x="4">based mechanism.</tspan></text> </g> <g id="shape8-43" transform="translate(78.3197,-434.904)" v:groupcontext="shape" v:mid="8">
<title>工作表.8</title>
<desc>Cascade Learning</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="825.154" height="95.6356" width="84.35"></v:textrect> <rect height="95.6356" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="777.336"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="22.21" y="821.55"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Cascade <tspan dy="1.2em" style="font-size: 1em;" x="21.19">Learning</tspan></text> </g> <g id="shape9-47" transform="translate(162.703,-434.904)" v:groupcontext="shape" v:mid="9">
<title>工作表.9</title>
<desc>Coarse-to-fine learning strategy which collects information f...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="825.154" height="95.6356" width="280.87"></v:textrect> <rect height="95.6356" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="777.336"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="799.95"><v:paragraph></v:paragraph><v:tablist></v:tablist>Coarse-to-fine learning strategy which collects <tspan dy="1.2em" style="font-size: 1em;" x="4">information from the output of the given classifiers to </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">build stronger classifiers in a cascaded manner. </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">RefineDet and Cascade R</tspan>-CNN utilized cascade learning <tspan dy="1.2em" style="font-size: 1em;" x="4">methods in refining object locations.</tspan></text> </g> <g id="shape10-54" transform="translate(78.3197,-221.56)" v:groupcontext="shape" v:mid="10">
<title>工作表.10</title>
<desc>Others</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="766.3" height="213.344" width="84.35"></v:textrect> <rect height="213.344" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="659.627"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="25.62" y="769.9"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Others</text> </g> <g id="shape11-57" transform="translate(162.703,-221.56)" v:groupcontext="shape" v:mid="11">
<title>工作表.11</title>
<desc>Adversarial learning, Perceptual GAN for small object detecti...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="766.476" height="212.992" width="280.87"></v:textrect> <rect height="212.992" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="659.98"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em; font-weight: bold;" v:langid="1033" x="4" y="676.48"><v:paragraph></v:paragraph><v:tablist></v:tablist>Adversarial learning<tspan style="font-size: 1em; font-weight: normal;">, Perceptual GAN for small object </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">detection. Learned high</tspan><tspan style="font-size: 1em; font-weight: normal;">-</tspan><tspan style="font-size: 1em; font-weight: normal;">resolution feature </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">representations of small objects via an adversarial </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">scheme. <v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Training from Scratch</tspan><tspan style="font-size: 1em; font-weight: normal;">. For two reasons. The bias of </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">loss functions and data distribution between </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">classification and detection can have an adversarial </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">impact on the performance. </tspan><tspan style="font-size: 1em; font-weight: normal;">Transferring a classification </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">model for detection in a new domain can lead to more </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">challenges. Examples: DSOD (Deeply Supervised Object </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">Detectors), gated recurrent feature pyramid.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Knowledge Distillation</tspan><tspan style="font-size: 1em; font-weight: normal;">. Distills the knowledge in an </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">ensemble of models into a single model via teacher</tspan><tspan style="font-size: 1em; font-weight: normal;">-</tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">student training scheme.</tspan></text> </g> <g id="shape12-80" transform="translate(18.375,-18.375)" v:groupcontext="shape" v:mid="12">
<title>工作表.12</title>
<desc>Testing Stage</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="29.9573" cy="771.379" height="203.185" width="59.92"></v:textrect> <rect height="203.185" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="59.9145" x="0" y="669.787"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="12.34" y="767.78"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Testing <tspan dy="1.2em" style="font-size: 1em;" x="16.51">Stage</tspan></text> </g> <g id="shape13-84" transform="translate(78.3197,-103.714)" v:groupcontext="shape" v:mid="13">
<title>工作表.13</title>
<desc>Duplicate Removal</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="814.049" height="117.846" width="84.35"></v:textrect> <rect height="117.846" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="755.126"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="19.02" y="810.45"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Duplicate <tspan dy="1.2em" style="font-size: 1em;" x="21.01">Removal</tspan></text> </g> <g id="shape14-88" transform="translate(162.703,-103.714)" v:groupcontext="shape" v:mid="14">
<title>工作表.14</title>
<desc>Non maximum suppression(NMS), predefined threshold will resul...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="814.049" height="117.846" width="280.87"></v:textrect> <rect height="117.846" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="755.126"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em; font-weight: bold;" v:langid="1033" x="4" y="781.65"><v:paragraph></v:paragraph><v:tablist></v:tablist>Non maximum suppression(NMS)<tspan style="font-size: 1em; font-weight: normal;">, the </tspan><tspan style="fill: #ff0000; font-size: 1em; font-weight: normal;">predefined </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em; font-weight: normal;" x="4">threshold will result in a missing prediction, and this </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em; font-weight: normal;" x="4">scenario is very common in clustered object detection.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Soft</tspan>-NMS<tspan style="font-size: 1em; font-weight: normal;">, decayed the confidence score of B as a </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">continuous function F. </tspan><tspan style="fill: #198742; font-size: 1em; font-weight: normal;">Avoided eliminating prediction </tspan><tspan dy="1.2em" style="fill: #198742; font-size: 1em; font-weight: normal;" x="4">of clustered objects</tspan><tspan style="font-size: 1em; font-weight: normal;">.</tspan></text> </g> <g id="shape15-101" transform="translate(78.3197,-18.3751)" v:groupcontext="shape" v:mid="15">
<title>工作表.15</title>
<desc>Model Acceleration</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.1745" cy="830.302" height="85.3388" width="84.35"></v:textrect> <rect height="85.3388" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="84.3489" x="0" y="787.633"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="26.37" y="826.7"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Model <tspan dy="1.2em" style="font-size: 1em;" x="11.61">Acceleration</tspan></text> </g> <g id="shape16-105" transform="translate(162.702,-18.3751)" v:groupcontext="shape" v:mid="16">
<title>工作表.16</title>
<desc>Examples: R-FCN, Light Head R-CNN, MobileNet with depth-wise ...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="140.434" cy="830.302" height="85.3388" width="280.87"></v:textrect> <rect height="85.3388" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="280.868" x="0" y="787.633"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="812.3"><v:paragraph></v:paragraph><v:tablist></v:tablist>Examples: R-FCN, Light Head R-CNN, MobileNet with <tspan dy="1.2em" style="font-size: 1em;" x="4">depth</tspan>-wise convolution layers. Optimize models off-<tspan dy="1.2em" style="font-size: 1em;" x="4">line, such as model compression and quantization. </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">Acceleration toolkit TensorRT.</tspan></text> </g> </g> </svg></p>
<h2>Conclusion and further thoughts</h2>
<p>This series gives you an overview of several critical parts you might find in deep learning for object detection as well as how they build upon each other. Finally, let's conclude the series with the network structure of Faster RCNN with FPN.</p>
<p><img alt="two-stage-network" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/6d13d8be896a6a5f20c36d357f92cd44beeb62dc/images/object-detection/two-stage-network.png"/></p>Recent Advances in Deep Learning for Object Detection - Part 12019-09-01T03:45:07+00:002024-03-19T02:45:52+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/recent-advances-in-deep-learning-for-object-detection/<p><img alt="advance" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/e4548fcb546412476cf8d9664224f14db8590743/images/object-detection/advance.png"/></p>
<p><span class="fontstyle0">When comes to training a custom object detection model, <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">TensorFlow object detection API</a> and <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/">MMdetection</a>(PyTorch) are two readily available options, I have shown you how to do that even on Google Colab's free GPU resources. </span></p>
<p><span class="fontstyle0">Those two frameworks are easy to use with simple configuration interface and let the framework source code does the heavy lifting. But do you ever wonder how the deep learning object detection algorithms are evolved over the years, their pros and cons?</span></p>
<p><span class="fontstyle0"></span>I find the paper - <a href="https://arxiv.org/pdf/1908.03673.pdf">Recent Advances in Deep Learning for Object Detection</a> a really good answer to this quest. Let me summarize what I have learned, hopefully, elaborate in a more intuitive way.</p>
<p><em>Text colors: <strong style="color: green;">pro</strong>/<strong style="color: red;">cons</strong></em></p>
<h2><span class="fontstyle0">Detection Paradigms</span></h2>
<h3>Two-stage Detectors</h3>
<p><img alt="two-stage" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/e4548fcb546412476cf8d9664224f14db8590743/images/object-detection/two-stage.png"/></p>
<p><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 592.766 619.695" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.16666em}
.st3 {font-size:1em}
.st4 {fill:#00882b;font-family:Calibri;font-size:1.16666em}
.st5 {fill:#ff0000;font-size:1em}
.st6 {fill:#ff0000;font-family:Calibri;font-size:1.16666em}
.st7 {fill:#459b5c;font-family:Calibri;font-size:1.16666em}
.st8 {fill:#198742;font-family:Calibri;font-size:1.16666em}
.st9 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="1" v:mid="0">
<title>页-1</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape13-1" transform="translate(18.375,-525.013)" v:groupcontext="shape" v:mid="13">
<title>工作表.13</title>
<desc>R-CNN</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="59.8124" cy="581.542" height="76.3066" width="119.63"></v:textrect> <rect height="76.3066" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="543.388"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="42.68" y="585.74"><v:paragraph v:bulletsize="0.166667" v:horizalign="1"></v:paragraph><v:tablist></v:tablist>R-CNN</text> </g> <g id="shape14-4" transform="translate(18.375,-392.548)" v:groupcontext="shape" v:mid="14">
<title>工作表.14</title>
<desc>SPP-net Spatial Pyramid Pooling</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="59.8124" cy="553.462" height="132.465" width="119.63"></v:textrect> <rect height="132.465" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="487.23"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="39.3" y="540.86"><v:paragraph v:bulletsize="0.166667" v:horizalign="1"></v:paragraph><v:tablist></v:tablist>SPP-net<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="14.04"> </tspan>Spatial Pyramid <tspan dy="1.2em" style="font-size: 1em;" x="38.63">Pooling </tspan> </text> </g> <g id="shape15-9" transform="translate(18.375,-286.276)" v:groupcontext="shape" v:mid="15">
<title>工作表.15</title>
<desc>Fast RCNN</desc> <v:textblock v:margins="rect(4,4,4,4)"></v:textblock> <v:textrect cx="59.8124" cy="566.559" height="106.272" width="119.63"></v:textrect> <rect height="106.272" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="513.423"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="31.59" y="570.76"><v:paragraph v:bulletsize="0.166667" v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Fast RCNN</text> </g> <g id="shape16-12" transform="translate(138,-286.276)" v:groupcontext="shape" v:mid="16">
<title>工作表.16</title>
<desc>ROI pooling layer. The feature extraction, region classificat...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.195" cy="566.327" height="106.735" width="436.4"></v:textrect> <rect height="106.735" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.391" x="0" y="512.96"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="536.93"><v:paragraph></v:paragraph><v:tablist></v:tablist>ROI pooling layer. The feature extraction, region classification and <tspan dy="1.2em" style="font-size: 1em;" x="4">bounding box regression steps can all be optimized end</tspan>-to-end, without <tspan dy="1.2em" style="font-size: 1em;" x="4">extra cache space to store features.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">The proposal generation step still relied on traditional methods such as </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">Selective Search or Edge Boxes.</tspan></text> </g> <g id="shape17-19" transform="translate(138,-525.013)" v:groupcontext="shape" v:mid="17">
<title>工作表.17</title>
<desc>The whole detection framework could not be optimized in an en...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.013" cy="581.542" height="76.3066" width="436.03"></v:textrect> <rect height="76.3066" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.027" x="0" y="543.388"></rect> <text style="fill: #ff0000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="577.34"><v:paragraph></v:paragraph><v:tablist></v:tablist>The whole detection framework could not be optimized in an end-to-end <tspan dy="1.2em" style="font-size: 1em;" x="4">manner, making it difficult to obtain global optimal solution.</tspan></text> </g> <g id="shape18-23" transform="translate(138,-392.548)" v:groupcontext="shape" v:mid="18">
<title>工作表.18</title>
<desc>Spatial Pyramid Pooling (SPP) layer achieved better results a...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.195" cy="553.462" height="132.465" width="436.4"></v:textrect> <rect height="132.465" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.391" x="0" y="487.23"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="507.26"><v:paragraph></v:paragraph><v:tablist></v:tablist>Spatial Pyramid Pooling (SPP) layer achieved better results and had a <tspan dy="1.2em" style="font-size: 1em;" x="4">significantly faster inference speed. <v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">However the training of SPP</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">net was still multi</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">stage and thus it could not </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">be optimized end</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">to</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">end.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">SPP layer did not back</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">propagate gradients to convolutional kernels and </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">thus all the parameters before the SPP layer were frozen, limited learning </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">capability of deep backbone architectures.</tspan></text> </g> <g id="shape19-42" transform="translate(18.375,-136.227)" v:groupcontext="shape" v:mid="19">
<title>工作表.19</title>
<desc>Faster R-CNN</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.8124" cy="544.555" height="150.279" width="119.63"></v:textrect> <rect height="150.279" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="469.416"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="21.94" y="548.76"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Faster R-CNN</text> </g> <g id="shape20-45" transform="translate(138,-136.227)" v:groupcontext="shape" v:mid="20">
<title>工作表.20</title>
<desc>Region Proposal Network(RPN) The whole framework could be opt...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.195" cy="544.555" height="150.279" width="436.4"></v:textrect> <rect height="150.279" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.391" x="0" y="469.416"></rect> <text style="fill: #459b5c; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="489.96"><v:paragraph></v:paragraph><v:tablist></v:tablist>Region Proposal Network(RPN)<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">The whole framework could be optimized in an end</tspan>-to-end manner on <tspan dy="1.2em" style="font-size: 1em;" x="4">training data. <v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">The computation was not shared in the region classification step, where </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">each feature vector still needed to go through a sequence of FC layers </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">separately.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">used a single deep layer feature map to make the final prediction. This </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">made it difficult to detect objects at different scales.</tspan></text> </g> <g id="shape21-55" transform="translate(18.375,-72.4479)" v:groupcontext="shape" v:mid="21">
<title>工作表.21</title>
<desc>Region-based Fully Convolutional Networks (R-FCN)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.8124" cy="587.473" height="64.4448" width="119.63"></v:textrect> <rect height="64.4448" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="555.25"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="6.33" y="574.87"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Region-based Fully <tspan dy="1.2em" style="font-size: 1em;" x="20.29">Convolutional </tspan><tspan dy="1.2em" style="font-size: 1em;" x="9.17">Networks (R</tspan>-FCN)</text> </g> <g id="shape22-60" transform="translate(138,-72.4479)" v:groupcontext="shape" v:mid="22">
<title>工作表.22</title>
<desc>The detector achieved competitive results compared to Faster ...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.195" cy="587.473" height="64.4448" width="436.4"></v:textrect> <rect height="64.4448" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.391" x="0" y="555.25"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="4" y="574.87"><v:paragraph></v:paragraph><v:tablist></v:tablist>The detector achieved competitive results compared to Faster RCNN <tspan dy="1.2em" style="font-size: 1em;" x="4">without region</tspan>-wise fully connected layer operations.<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">Position-Sensitive Score Map.</tspan></text> </g> <g id="shape23-65" transform="translate(18.375,-18.375)" v:groupcontext="shape" v:mid="23">
<title>工作表.23</title>
<desc>Feature Pyramid Networks(FPN)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.8124" cy="592.658" height="54.0729" width="119.63"></v:textrect> <rect height="54.0729" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.625" x="0" y="565.622"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="12.78" y="588.46"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Feature Pyramid <tspan dy="1.2em" style="font-size: 1em;" x="16.81">Networks(FPN)</tspan></text> </g> <g id="shape24-69" transform="translate(138,-18.375)" v:groupcontext="shape" v:mid="24">
<title>工作表.24</title>
<desc>Enable object detection in feature maps at different scales.</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="218.195" cy="592.658" height="54.0729" width="436.4"></v:textrect> <rect height="54.0729" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="436.391" x="0" y="565.622"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.16666em;" v:langid="1033" x="51.91" y="596.86"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Enable object detection in feature maps at different scales. </text> </g> </g> </svg></p>
<h3>One-stage Detectors</h3>
<p><img alt="one-stage" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/e4548fcb546412476cf8d9664224f14db8590743/images/object-detection/one-stage.png"/></p>
<p><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 512.97 589.506" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.08334em}
.st3 {fill:#198742;font-family:Calibri;font-size:1.08334em}
.st4 {font-size:1em}
.st5 {fill:#ff0000;font-size:1em}
.st6 {fill:#459b5c;font-family:Calibri;font-size:1.08334em}
.st7 {fill:#00882b;font-family:Calibri;font-size:1.08334em}
.st8 {font-size:1em;font-weight:bold}
.st9 {fill:#00882b;font-family:Calibri;font-size:1.08334em;font-weight:bold}
.st10 {font-size:1em;font-weight:normal}
.st11 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="38" v:mid="113">
<title>adv2</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape1-1" transform="translate(18.375,-471.385)" v:groupcontext="shape" v:mid="1">
<title>工作表.1</title>
<desc>OverFeat</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="539.633" height="99.7458" width="119.06"></v:textrect> <rect height="99.7458" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="489.76"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="36.75" y="543.53"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>OverFeat</text> </g> <g id="shape2-4" transform="translate(137.43,-471.384)" v:groupcontext="shape" v:mid="2">
<title>工作表.2</title>
<desc>In order to detect multiscale objects, the input image was re...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="539.633" height="99.7467" width="357.17"></v:textrect> <rect height="99.7467" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="489.759"></rect> <text style="fill: #198742; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="6.94" y="512.33"><v:paragraph></v:paragraph><v:tablist></v:tablist>In order to detect multiscale objects, the input image was resized <tspan dy="1.2em" style="font-size: 1em;" x="4">into multiple scales which were fed into the network. Finally, the </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">predictions across all the scales were merged together.<v:newlinechar></v:newlinechar></tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">However, the training of classifiers and regressors were separated </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">without being jointly optimized.</tspan></text> </g> <g id="shape3-11" transform="translate(18.375,-347.256)" v:groupcontext="shape" v:mid="3">
<title>工作表.3</title>
<desc>YOLO</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="527.441" height="124.129" width="119.06"></v:textrect> <rect height="124.129" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="465.377"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="45.02" y="531.34"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>YOLO</text> </g> <g id="shape4-14" transform="translate(137.43,-347.256)" v:groupcontext="shape" v:mid="4">
<title>工作表.4</title>
<desc>Divided the whole image into fixed number of grid cells. YOLO...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="527.441" height="124.129" width="357.17"></v:textrect> <rect height="124.129" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="465.377"></rect> <text style="fill: #459b5c; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="4" y="492.34"><v:paragraph></v:paragraph><v:tablist></v:tablist>Divided the whole image into a fixed number of grid cells.<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">YOLO faced some challenges: i) it could detect up to only two </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">objects at a given location, which made it difficult to detect small </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">objects and crowded objects. ii) only the last feature map was </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">used for prediction, which was not suitable for predicting objects </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">at multiple scales and aspect ratios</tspan></text> </g> <g id="shape5-22" transform="translate(18.375,-250.068)" v:groupcontext="shape" v:mid="5">
<title>工作表.5</title>
<desc>Single-Shot Mulibox Detector (SSD)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="540.912" height="97.1885" width="119.06"></v:textrect> <rect height="97.1885" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="492.317"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="6.77" y="537.01"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Single-Shot Mulibox <tspan dy="1.2em" style="font-size: 1em;" x="20.88">Detector (SSD)</tspan></text> </g> <g id="shape6-26" transform="translate(137.43,-250.068)" v:groupcontext="shape" v:mid="6">
<title>工作表.6</title>
<desc>A set of anchors with multiple scales and aspect-ratios were ...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="540.912" height="97.1885" width="357.17"></v:textrect> <rect height="97.1885" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="492.317"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="4" y="505.81"><v:paragraph></v:paragraph><v:tablist></v:tablist>A set of <tspan style="font-size: 1em; font-weight: bold;">anchors </tspan>with multiple scales and aspect-ratios were <tspan dy="1.2em" style="font-size: 1em;" x="4">generated to discretize the output space of bounding boxes, </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">predicted objects on multiple feature maps. (</tspan><tspan style="font-size: 1em; font-weight: bold;">multiple scales</tspan>), <tspan dy="1.2em" style="font-size: 1em;" x="4">hard negative mining.</tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">The class imbalance between foreground and background is a </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">severe problem in one</tspan><tspan style="fill: #ff0000; font-size: 1em;">-</tspan><tspan style="fill: #ff0000; font-size: 1em;">stage detector.</tspan></text> </g> <g id="shape7-38" transform="translate(18.375,-153.306)" v:groupcontext="shape" v:mid="7">
<title>工作表.7</title>
<desc>RetinaNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="541.125" height="96.7616" width="119.06"></v:textrect> <rect height="96.7616" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="492.744"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="32.96" y="545.03"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>RetinaNet</text> </g> <g id="shape8-41" transform="translate(137.43,-153.306)" v:groupcontext="shape" v:mid="8">
<title>工作表.8</title>
<desc>Focal loss which suppressed the gradients of easy negative sa...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="541.125" height="96.7616" width="357.17"></v:textrect> <rect height="96.7616" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="492.744"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.08334em; font-weight: bold;" v:langid="1033" x="4" y="513.83"><v:paragraph></v:paragraph><v:tablist></v:tablist>Focal loss<tspan style="font-size: 1em; font-weight: normal;"> </tspan><tspan style="font-size: 1em; font-weight: normal;">which suppressed the gradients of easy negative </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">samples instead of simply discarding them.used </tspan>feature pyramid <tspan dy="1.2em" style="font-size: 1em;" x="4">networks</tspan><tspan style="font-size: 1em; font-weight: normal;"> </tspan><tspan style="font-size: 1em; font-weight: normal;">to detect multi</tspan><tspan style="font-size: 1em; font-weight: normal;">-</tspan><tspan style="font-size: 1em; font-weight: normal;">scale objects at different levels of </tspan><tspan dy="1.2em" style="font-size: 1em; font-weight: normal;" x="4">feature maps.</tspan><v:newlinechar></v:newlinechar></text> </g> <g id="shape9-53" transform="translate(18.375,-80.6895)" v:groupcontext="shape" v:mid="9">
<title>工作表.9</title>
<desc>YOLOv2</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="553.411" height="72.1896" width="119.06"></v:textrect> <rect height="72.1896" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="517.316"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="38.79" y="557.31"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>YOLOv2</text> </g> <g id="shape10-56" transform="translate(137.43,-80.6895)" v:groupcontext="shape" v:mid="10">
<title>工作表.10</title>
<desc>Adopted a more powerful deep convolutional backbone. YOLOv2 d...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="553.411" height="72.1896" width="357.17"></v:textrect> <rect height="72.1896" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="517.316"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="4" y="533.91"><v:paragraph></v:paragraph><v:tablist></v:tablist>Adopted a more powerful deep convolutional backbone. YOLOv2 <tspan dy="1.2em" style="font-size: 1em;" x="4">defined better anchor priors by k</tspan>-means clustering from the <tspan dy="1.2em" style="font-size: 1em;" x="4">training data (instead of setting manually). multi</tspan>-scale training <tspan dy="1.2em" style="font-size: 1em;" x="4">techniques.</tspan></text> </g> <g id="shape11-62" transform="translate(137.43,-18.375)" v:groupcontext="shape" v:mid="11">
<title>工作表.11</title>
<desc>Anchor-free object, the goal was to predict keypoints of the ...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="178.583" cy="558.349" height="62.3145" width="357.17"></v:textrect> <rect height="62.3145" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="357.165" x="0" y="527.191"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="4" y="554.45"><v:paragraph></v:paragraph><v:tablist></v:tablist>Anchor-free object, the goal was to predict key points of the <tspan dy="1.2em" style="font-size: 1em;" x="4">bounding box, instead of trying to fit an object to an anchor.</tspan></text> </g> <g id="shape12-66" transform="translate(18.375,-18.375)" v:groupcontext="shape" v:mid="12">
<title>工作表.12</title>
<desc>CornerNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="59.5276" cy="558.349" height="62.3145" width="119.06"></v:textrect> <rect height="62.3145" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="119.055" x="0" y="527.191"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.08334em;" v:langid="1033" x="33.31" y="562.25"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>CornerNet</text> </g> </g> </svg></p>
<h3>Backbone Architecture</h3>
<h2><svg color-interpolation-filters="sRGB" style="fill: none; fill-rule: evenodd; font-size: 12px; overflow: visible; stroke-linecap: square; stroke-miterlimit: 3;" viewbox="0 0 496.49 592.474" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/" xmlns:xlink="http://www.w3.org/1999/xlink"> <v:documentproperties v:langid="1033" v:metric="true" v:viewmarkup="false"> <v:userdefs> <v:ud v:nameu="msvNoAutoConnect" v:val="VT0(1):26"></v:ud> </v:userdefs> </v:documentproperties>
<style type="text/css"><!--
.st1 {fill:#ffffff;stroke:#000000;stroke-linecap:round;stroke-linejoin:round;stroke-width:0.75}
.st2 {fill:#000000;font-family:Calibri;font-size:1.00001em}
.st3 {fill:#00882b;font-family:Calibri;font-size:1.00001em}
.st4 {font-size:1em}
.st5 {fill:#ff0000;font-size:1em}
.st6 {font-size:1em;font-weight:bold}
.st7 {fill:none;fill-rule:evenodd;font-size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
--></style>
<g v:groupcontext="foregroundPage" v:index="39" v:mid="116">
<title>adv3</title>
<v:pageproperties v:drawingscale="0.0393701" v:drawingunits="24" v:pagescale="0.0393701" v:shadowoffsetx="8.50394" v:shadowoffsety="-8.50394"></v:pageproperties> <g id="shape1-1" transform="translate(18.375,-509.77)" v:groupcontext="shape" v:mid="1">
<title>工作表.1</title>
<desc>ResNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5963" cy="576.392" height="32.1646" width="85.2"></v:textrect> <rect height="32.1646" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1926" x="0" y="560.309"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="25.14" y="579.99"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ResNet</text> </g> <g id="shape2-4" transform="translate(18.375,-453.377)" v:groupcontext="shape" v:mid="2">
<title>工作表.2</title>
<desc>ResNet-v2</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5963" cy="564.278" height="56.3931" width="85.2"></v:textrect> <rect height="56.3931" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1926" x="0" y="536.081"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="17.55" y="567.88"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ResNet-v2</text> </g> <g id="shape3-7" transform="translate(103.466,-453.377)" v:groupcontext="shape" v:mid="3">
<title>工作表.3</title>
<desc>Appropriate ordering of the Batch Normalization could further...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="564.137" height="56.6749" width="374.65"></v:textrect> <rect height="56.6749" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="535.799"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="553.34"><v:paragraph></v:paragraph><v:tablist></v:tablist>Appropriate ordering of the Batch Normalization could further perform <tspan dy="1.2em" style="font-size: 1em;" x="4">better than original ResNet. possible to successfully train a network with </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">more than 1000 layers.</tspan></text> </g> <g id="shape4-12" transform="translate(103.466,-509.77)" v:groupcontext="shape" v:mid="4">
<title>工作表.4</title>
<desc>Reduced optimization difficulties by introducing shortcut con...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="576.392" height="32.1646" width="374.65"></v:textrect> <rect height="32.1646" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="560.309"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="572.79"><v:paragraph></v:paragraph><v:tablist></v:tablist>Reduced optimization difficulties by introducing shortcut connections. <tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">Shortcut connection, it did not fully utilize features from previous layers.</tspan></text> </g> <g id="shape5-16" transform="translate(18.375,-396.973)" v:groupcontext="shape" v:mid="5">
<title>工作表.5</title>
<desc>DenseNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.7575" cy="564.137" height="56.6749" width="85.52"></v:textrect> <rect height="56.6749" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.515" x="0" y="535.799"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="18.73" y="567.74"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>DenseNet </text> </g> <g id="shape6-19" transform="translate(103.466,-396.973)" v:groupcontext="shape" v:mid="6">
<title>工作表.6</title>
<desc>Retained the shallow layer features, and improved information...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="564.137" height="56.6749" width="374.65"></v:textrect> <rect height="56.6749" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="535.799"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="553.34"><v:paragraph></v:paragraph><v:tablist></v:tablist>Retained the shallow layer features, and improved information flow, by <tspan dy="1.2em" style="font-size: 1em; font-weight: bold;" x="4">concatenating </tspan>the input with the residual output instead of element-wise <tspan dy="1.2em" style="font-size: 1em;" x="4">addition.</tspan></text> </g> <g id="shape7-24" transform="translate(103.466,-318.514)" v:groupcontext="shape" v:mid="7">
<title>工作表.7</title>
<desc>Integrating the advantages of both ResNet and DenseNet. Divid...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="553.109" height="78.7298" width="374.65"></v:textrect> <rect height="78.7298" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="513.744"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="535.11"><v:paragraph></v:paragraph><v:tablist></v:tablist>Integrating the advantages of both ResNet and DenseNet.<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">Divides x channels into two parts, used for dense connection computation </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">and element</tspan>-wise summation, the result was the concatenated output of the <tspan dy="1.2em" style="font-size: 1em;" x="4">two branches.</tspan></text> </g> <g id="shape8-30" transform="translate(18.375,-318.514)" v:groupcontext="shape" v:mid="8">
<title>工作表.8</title>
<desc>Dual Path Network(DPN)</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.596" cy="553.109" height="78.7298" width="85.2"></v:textrect> <rect height="78.7298" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1921" x="0" y="513.744"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="20.37" y="549.51"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Dual Path <tspan dy="1.2em" style="font-size: 1em;" x="7.15">Network(DPN) </tspan> </text> </g> <g id="shape23-34" transform="translate(18.3757,-252.904)" v:groupcontext="shape" v:mid="23">
<title>工作表.23</title>
<desc>ResNeXt</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.596" cy="559.669" height="65.6098" width="85.2"></v:textrect> <rect height="65.6098" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1921" x="0" y="526.864"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="22.02" y="563.27"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>ResNeXt </text> </g> <g id="shape24-37" transform="translate(103.466,-252.904)" v:groupcontext="shape" v:mid="24">
<title>工作表.24</title>
<desc>Considerably reduced computation and memory cost while mainta...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="559.669" height="65.6098" width="374.65"></v:textrect> <rect height="65.6098" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="526.864"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="541.67"><v:paragraph></v:paragraph><v:tablist></v:tablist>Considerably reduced computation and memory cost while maintaining <tspan dy="1.2em" style="font-size: 1em;" x="4">comparable classification accuracy. Adopted </tspan><tspan style="font-size: 1em; font-weight: bold;">group convolution layers</tspan> <tspan dy="1.2em" style="font-size: 1em;" x="4">which sparsely connects feature map channels to reduce computation </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">cost.</tspan></text> </g> <g id="shape25-44" transform="translate(18.3757,-215.768)" v:groupcontext="shape" v:mid="25">
<title>工作表.25</title>
<desc>MobileNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5957" cy="573.906" height="37.1358" width="85.2"></v:textrect> <rect height="37.1358" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1914" x="0" y="555.338"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="16.54" y="577.51"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>MobileNet</text> </g> <g id="shape26-47" transform="translate(103.466,-215.487)" v:groupcontext="shape" v:mid="26">
<title>工作表.26</title>
<desc>Significantly reduced computation cost as well as number of p...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="573.765" height="37.4183" width="374.65"></v:textrect> <rect height="37.4183" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="555.056"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="570.16"><v:paragraph></v:paragraph><v:tablist></v:tablist>Significantly reduced computation cost as well as the number of parameters <tspan dy="1.2em" style="font-size: 1em;" x="4">without significant loss in classification accuracy.</tspan></text> </g> <g id="shape27-51" transform="translate(103.466,-159.646)" v:groupcontext="shape" v:mid="27">
<title>工作表.27</title>
<desc>Increasing model width to improve the learning capacity, appl...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="564.413" height="56.1221" width="374.65"></v:textrect> <rect height="56.1221" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="536.352"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="553.61"><v:paragraph></v:paragraph><v:tablist></v:tablist>Increasing model width to improve the learning capacity, applied different <tspan dy="1.2em" style="font-size: 1em;" x="4">scale convolution kernels (1 × 1; 3 × 3 and 5 × 5) on the same feature map </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">in a given layer. </tspan> </text> </g> <g id="shape28-56" transform="translate(18.3757,-159.646)" v:groupcontext="shape" v:mid="28">
<title>工作表.28</title>
<desc>GoogleNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5957" cy="564.413" height="56.1221" width="85.2"></v:textrect> <rect height="56.1221" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1914" x="0" y="536.352"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="16.43" y="568.01"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>GoogleNet</text> </g> <g id="shape29-59" transform="translate(18.3764,-103.242)" v:groupcontext="shape" v:mid="29">
<title>工作表.29</title>
<desc>DetNet</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5957" cy="564.272" height="56.4039" width="85.2"></v:textrect> <rect height="56.4039" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1914" x="0" y="536.07"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="25.04" y="567.87"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>DetNet</text> </g> <g id="shape30-62" transform="translate(103.466,-103.242)" v:groupcontext="shape" v:mid="30">
<title>工作表.30</title>
<desc>Kept high resolution feature maps for prediction with dilated...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="564.272" height="56.4039" width="374.65"></v:textrect> <rect height="56.4039" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="536.07"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="560.67"><v:paragraph></v:paragraph><v:tablist></v:tablist>Kept high-resolution feature maps for prediction with<tspan style="font-size: 1em; font-weight: bold;"> </tspan><tspan style="font-size: 1em; font-weight: bold;">dilated convolutions</tspan> <tspan dy="1.2em" style="font-size: 1em;" x="4">to increase receptive fields. Detected objects on multi</tspan>-scale feature maps.</text> </g> <g id="shape31-68" transform="translate(18.3764,-18.375)" v:groupcontext="shape" v:mid="31">
<title>工作表.31</title>
<desc>Hourglass Network</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5957" cy="550.04" height="84.8674" width="85.2"></v:textrect> <rect height="84.8674" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1914" x="0" y="507.607"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="18.68" y="546.44"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>Hourglass <tspan dy="1.2em" style="font-size: 1em;" x="21.45">Network</tspan></text> </g> <g id="shape32-72" transform="translate(103.466,-18.375)" v:groupcontext="shape" v:mid="32">
<title>工作表.32</title>
<desc>First appeared in human pose recognition task. First downsamp...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="550.04" height="84.8674" width="374.65"></v:textrect> <rect height="84.8674" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="507.607"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="524.84"><v:paragraph></v:paragraph><v:tablist></v:tablist>First appeared in the human pose recognition task.<v:newlinechar></v:newlinechar><tspan dy="1.2em" style="font-size: 1em;" x="4">First downsampled the input image via a sequence of convolutional layer </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">or pooling layer, and upsampled the feature map via deconvolutional </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">operation. To avoid information loss in downsampling stage, skip </tspan><tspan dy="1.2em" style="font-size: 1em;" x="4">connections were used between downsampling and upsampling features.</tspan></text> </g> <g id="shape33-79" transform="translate(18.375,-541.934)" v:groupcontext="shape" v:mid="33">
<title>工作表.33</title>
<desc>VGG16</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="42.5963" cy="576.392" height="32.1646" width="85.2"></v:textrect> <rect height="32.1646" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="85.1926" x="0" y="560.309"></rect> <text style="fill: #000000; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="25.54" y="579.99"><v:paragraph v:horizalign="1"></v:paragraph><v:tablist></v:tablist>VGG16</text> </g> <g id="shape34-82" transform="translate(103.466,-541.934)" v:groupcontext="shape" v:mid="34">
<title>工作表.34</title>
<desc>Increasing depth led to better performance, but also led to o...</desc> <v:textblock v:margins="rect(4,4,4,4)" v:tabspace="42.5197"></v:textblock> <v:textrect cx="187.324" cy="576.392" height="32.1646" width="374.65"></v:textrect> <rect height="32.1646" style="fill: #ffffff; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-width: 0.75;" width="374.649" x="0" y="560.309"></rect> <text style="fill: #00882b; font-family: Calibri; font-size: 1.00001em;" v:langid="1033" x="4" y="572.79"><v:paragraph></v:paragraph><v:tablist></v:tablist>Increasing depth led to better performance, <tspan style="fill: #ff0000; font-size: 1em;">but also led to optimization </tspan><tspan dy="1.2em" style="fill: #ff0000; font-size: 1em;" x="4">challenges.</tspan></text> </g> </g> </svg>Conclusion and further reading</h2>
<p>This quick post summarized recent advance in deep learning object detection in three aspects, two-stage detector, one-stage detector and backbone architectures. Next time you are training a custom object detection with a third-party open-source framework, you will feel more confident to select an optimal option for your application by examing their pros and cons.</p>
<p>In the next post, I will finish up what we have left in the paper, namely the proposal generation, feature representation learning, and learning strategy. If you are interested, strongly suggested to give the <a href="https://arxiv.org/pdf/1908.03673.pdf">paper </a>a read, it will be well worth your time.</p>How to run Keras model on Jetson Nano in Nvidia Docker container2019-08-10T09:21:00+00:002024-03-18T14:44:22+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano-in-nvidia-docker-container/<p><img alt="docker_nano" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/a5f860d17b122866708e54f897363db1bed503f9/images/jetson/docker_nano.png"/></p>
<p>I wrote, "<a href="https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/">How to run Keras model on Jetson Nano</a>" a while back, where the model runs on the host OS. In this tutorial, I will show you how to start fresh and get the model running on Jetson Nano inside an Nvidia docker container.</p>
<p>You might wonder why bother with docker on Jetson Nano? I came up with several reasons.</p>
<p>1. It's much easier to reproduce the results with a docker container compared with installing the dependencies/libraries all by yourself. Since the docker image you pull from Docker Hub has all dependencies preinstalled which save you tons of time building from source.</p>
<p>2. It's less likely to mess up the Jetson Nano host OS since your code and dependencies are isolated from it. Even when you get into trouble, solving the issue is just restarting a new container away.</p>
<p>3. You can build your applications based on my base image with TensorFlow preinstalled in a much more controllable way by creating a new docker file.</p>
<p>4. You can cross-compile the Docker image with a much power computer such as an X86 based server, saves valuable time.</p>
<p>5. Finally, you guessed it, running code in Docker container is almost as speedy as running on the host OS with GPU acceleration available.</p>
<p>Hope you are convinced, here is a brief overview of how to make it happen.</p>
<ul>
<li>Install new JetPack 4.2.1 on Jetson Nano.</li>
<li>Cross-compiling Docker build setup on an X86 machine.</li>
<li>Build a Jetson Nano docker with TensorFlow GPU.</li>
<li>Build an overlay Docker image(Optional).</li>
<li>Run the frozen Keras TensorRT model in a Docker container.</li>
</ul>
<h2>Install new JetPack 4.2.1 on Jetson Nano</h2>
<p>Download the <a href="https://developer.nvidia.com/embedded/jetpack">JetPack 4.2.1 SD card image</a> from Nvidia. Extract the <strong>sd-blob-b01.img</strong> file from the zip. Flash it to a class 10 32GB minimal SD card with <a href="https://rufus.ie/">Rufus</a>. The SD card I have is a SanDisk class10 U1 64GB model.</p>
<p><img alt="rufus" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/a5f860d17b122866708e54f897363db1bed503f9/images/jetson/rufus.png"/></p>
<p>You can try another flasher like <a href="https://www.balena.io/etcher/">Etcher</a>, but I the SD card I flashed with Etcher cannot boot on Jetson Nano. I also tried installing the JetPack with SDK manager but running into an issue with the <span>"System configuration wizard". There is the <a href="https://devtalk.nvidia.com/default/topic/1058116/jetpack-4-2-1-fails-to-boot-on-nano-failed-to-start-load-kernel-modules/?offset=1">thread</a> I opened in the Nvidia Developer forum, their technical support is quite responsive.</span></p>
<p><span>Insert the SD card, plug in an HDMI monitor cable, USB keyboard, and mouse, then power up the board. Follow the system configuration wizard to finish the system configuration.</span></p>
<h2>Cross-compiling Docker build setup on an X86 machine</h2>
<p>Even though the Nvidia Docker runtime is pre-installed on the OS which allows you to build a Docker container right on the hardware. However, cross-compiling Docker on an X86 based machine can save a significant amount of building time considering larger processing power and network speed. So the one time set up for a cross-compiling environment is well worth the time. A docker container will be built on the server, pushed to a Docker registry such as the Docker Hub, then pulled from Jetson Nano.</p>
<p>On your X86 machine, it could be your laptop or a Linux server, install <a href="https://docs.docker.com/install/linux/docker-ce/ubuntu/">Docker</a> first following the official instruction.</p>
<p>Then install <strong>qemu</strong> from the command line, qemu will emulate Jetson Nano CPU architecture(which is aarch64) on your X86 machine when building Docker containers.</p>
<pre>sudo apt-get install -y qemu binfmt-support qemu-user-static<br/>wget <a href="http://archive.ubuntu.com/ubuntu/pool/main/b/binfmt-support/binfmt-support_2.1.8-2_amd64.deb">http://archive.ubuntu.com/ubuntu/pool/main/b/binfmt-support/binfmt-support_2.1.8-2_amd64.deb</a><br/>sudo apt install ./binfmt-support_2.1.8-2_amd64.deb<br/>rm binfmt-support_2.1.8-2_amd64.deb</pre>
<p>Finally, install podman. We will use that to build containers instead of the default docker container command-line interface.</p>
<pre>sudo apt update<br/>sudo apt -y install software-properties-common<br/>sudo add-apt-repository -y ppa:projectatomic/ppa<br/>sudo apt update<br/>sudo apt -y install podman</pre>
<p></p>
<h2>Build a Jetson Nano Docker with TensorFlow GPU</h2>
<p>We build our TensorFlow GPU Docker image based on the official <strong>nvcr.io/nvidia/l4t-base:r32.2</strong> image.</p>
<p>Here is the content of <strong>Dockerfile</strong>.</p>
<pre>FROM nvcr.io/nvidia/l4t-base:r32.2<br/>WORKDIR /<br/>RUN apt update && apt install -y --fix-missing make g++<br/>RUN apt update && apt install -y --fix-missing python3-pip libhdf5-serial-dev hdf5-tools<br/>RUN apt update && apt install -y python3-h5py<br/>RUN pip3 install --pre --no-cache-dir --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu<br/>RUN pip3 install -U numpy<br/>CMD [ "bash" ]</pre>
<p>Then you can pull the base image, build and push the container image to Docker Hub like this. </p>
<pre>podman pull nvcr.io/nvidia/l4t-base:r32.2<br/>podman build -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -t docker.io/zcw607/jetson:0.1.0 . -f ./Dockerfile<br/>podman push docker.io/zcw607/jetson:0.1.0</pre>
<p>Change <strong>zcw607</strong> to your own Docker Hub account name as necessary, you might have to do <code>docker login docker.io</code> first before you can push to the registry.</p>
<h2>Build an overlay Docker image(Optional)</h2>
<p>By building an overlay Docker image, you can add your code dependencies/libraries based on a previous Docker image.</p>
<p>For example, you want to install the Python pillow library and set up some other stuff, you can create a new Dockerfile like this.</p>
<pre>FROM zcw607/jetson:0.1.0<br/>WORKDIR /home<br/>ENV TZ=Asia/Hong_Kong<br/>RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone \<br/> apt update && apt install -y python3-pil<br/>CMD [ "bash" ]</pre>
<p>Then run those two lines to build and push the new container.</p>
<pre>podman build -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static -t docker.io/zcw607/jetson:r1.0.1 . -f ./Dockerfile<br/>podman push docker.io/zcw607/jetson:r1.0.1</pre>
<p>Now your two Docker containers reside in Docker Hub, let's sync up on Jetson Nano.</p>
<h2>Run TensorRT model in a Docker container</h2>
<p>In Jetson Nano command line, pull the Docker container from Docker Hub like this.</p>
<pre>docker pull docker.io/zcw607/jetson:r1.0.1</pre>
<p>Then start the container with the following command.</p>
<pre>docker run --runtime nvidia --network host -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix zcw607/jetson:r1.0.1</pre>
<p>Check TensorFlow GPU is installed, type "python3" in the command then,</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.python.client</span> <span class="kn">import</span> <span class="n">device_lib</span>
<span class="n">device_lib</span><span class="o">.</span><span class="n">list_local_devices</span><span class="p">()</span>
</pre>
</div>
<p>If everything works, it should print</p>
<p><img alt="tf_gpu" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/a5f860d17b122866708e54f897363db1bed503f9/images/jetson/tf_gpu.png"/></p>
<p>To run the TensorRT model inference benchmark, use<a href="https://raw.githubusercontent.com/Tony607/jetson_nvidia_dockers/master/overlay_example/test_trt_inference.py"> my Python script</a>. The model is converted from the Keras MobilNet V2 model for image classification. It achieves 30 FPS with 244 by 244 color image input. That is running in a Docker container, and it is even slightly faster compared with 27.18FPS running without a Docker container.</p>
<p><img alt="fps" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/a5f860d17b122866708e54f897363db1bed503f9/images/jetson/fps.png"/></p>
<p>Read <a href="https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/">my previous blog</a> to learn more about how to create your TensorRT model from Keras.</p>
<h2>Conclusion and further reading</h2>
<p>This tutorial shows the complete process to get a Keras model running on Jetson Nano inside an Nvidia Docker container. You can also learn how to build a Docker container on an X86 machine, push to Docker Hub and pulled from Jetson Nano. Check out my <a href="https://github.com/Tony607/jetson_nvidia_dockers">GitHub repo</a> for updated Dockerfile, build script and inference benchmark script.</p>
<p></p>How to create custom COCO data set for instance segmentation2019-07-27T09:00:37+00:002024-03-19T11:28:17+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-instance-segmentation/<p><img alt="anno_coco" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/49e70965a400db89ae7b83c66384651edefbceeb/images/object-detection/anno_coco.png"/></p>
<p>In this post, I will show you how simple it is to create your custom COCO dataset and train an instance segmentation model quick for free with Google Colab's GPU.</p>
<p>If you just want to know how to create custom COCO data set for object detection, check out my <a href="https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-object-detection/">previous tutorial</a>.</p>
<p><span>Instance segmentation is different from object detection annotation since it requires polygonal annotations instead of bound boxes. T</span>here are many tools freely available, such as labelme and coco-annotator. <a href="https://github.com/wkentaro/labelme">labelme</a> is easy to install and runs on all major OS, however, it lacks native support to export COCO data format annotations which are required for many model training frameworks/pipelines. <a href="https://github.com/jsbroks/coco-annotator">coco-annotator</a>, on the other hand,<span> is a web-based application which requires additional efforts to get it up and running on your machine. So way takes the least effort?</span></p>
<p><span>Here is an overview of how you can make your own COCO dataset for instance segmentation.</span></p>
<ul>
<li><span>Download labelme, run the application and annotate polygons on your images.</span></li>
<li><span>Run my script to convert the labelme annotation files to COCO dataset JSON file.</span></li>
</ul>
<h2><span>Annotate data with labelme</span></h2>
<p><span><a href="https://github.com/wkentaro/labelme">labelme</a> is quite similar to <a href="https://github.com/tzutalin/labelImg">labelimg</a> in bounding annotation. So anyone familiar with labelimg, start annotating with labelme should take no time.</span></p>
<p><span>You can install labelme like below or find prebuild executables in the <a href="https://github.com/wkentaro/labelme/releases/tag/v3.14.2">release sections</a>, or download the latest <a href="https://github.com/Tony607/labelme2coco/releases/download/V0.1/labelme.exe">Windows 64bit executable</a> I built earlier.</span></p>
<div class="highlight">
<pre><span class="c1"># python3</span>
<span class="n">conda</span> <span class="n">create</span> <span class="o">--</span><span class="n">name</span><span class="o">=</span><span class="n">labelme</span> <span class="n">python</span><span class="o">=</span><span class="mf">3.6</span>
<span class="n">source</span> <span class="n">activate</span> <span class="n">labelme</span>
<span class="c1"># or "activate labelme" on Windows</span>
<span class="c1"># conda install -c conda-forge pyside2</span>
<span class="c1"># conda install pyqt</span>
<span class="n">pip</span> <span class="n">install</span> <span class="n">pyqt5</span> <span class="c1"># pyqt5 can be installed via pip on python3</span>
<span class="n">pip</span> <span class="n">install</span> <span class="n">labelme</span>
</pre>
</div>
<p><span>When you open the tool, click the "Open Dir" button and navigate to your images folder where all image files are located then you can start drawing polygons. To finish drawing a polygon, press "Enter" key, the tool should connect the first and last dot automatically. When done annotating an image, press shortcut key "D" on the keyboard will take you to the next image. I annotated 18 images, each image containing multiple objects, it took me about 30 minutes.</span></p>
<p><img alt="labelme" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8e72e2f60663f2f8fd57511871ecb54cce394ce4/images/object-detection/labelme.png"/></p>
<p>Once you have all images annotated, you can find a list of JSON file in your images directory with the same base file name. Those are labelimg annotation files, we will convert them into a single COCO dataset annotation JSON file in the next step.(Or two JSON files for train/test split.)</p>
<h2><span>Convert labelme annotation files to COCO dataset format</span></h2>
<p><span>You can find the <a href="https://github.com/Tony607/labelme2coco/blob/master/labelme2coco.py">labelme2coco.py </a>file on my GitHub. To apply the conversion, it is only necessary to pass in one argument which is the images directory path.</span></p>
<pre><span>python labelme2coco.py images</span></pre>
<p>The script depends on three pip packages: labelme, numpy, and pillow. Go ahead and install them with pip if you are missing any of them. After executing the script, you will find a file named <code>trainval.json</code> located in the current directory, that is the COCO dataset annotation JSON file.</p>
<p>Then optionally, you can verify the annotation by opening the <a href="https://github.com/Tony607/labelme2coco/blob/master/COCO_Image_Viewer.ipynb">COCO_Image_Viewer.ipynb</a> jupyter notebook.</p>
<p>If everything works, it should show something like below.</p>
<p><img alt="coco_viewer" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/49e70965a400db89ae7b83c66384651edefbceeb/images/object-detection/coco_viewer.png"/></p>
<h2>Train an instance segmentation model with mmdetection framework</h2>
<p>If you are unfamiliar with the mmdetection framework, it is suggested to give my previous post a try - "<a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/">How to train an object detection model with mmdetection</a>". The framework allows you to train many object detection and instance segmentation models with configurable backbone networks through the same pipeline, the only thing necessary to modify is the model config python file where you define the model type, training epochs, type and path to the dataset and so on. For instance segmentation models, several options are available, you can do transfer learning with mask RCNN or cascade mask <span>RCNN </span>with the pre-trained backbone networks. To make it even beginner-friendly, just run <a href="https://colab.research.google.com/github/Tony607/mmdetection_instance_segmentation_demo/blob/master/mmdetection_train_custom_coco_data_segmentation.ipynb">the Google Colab notebook</a> online with free GPU resource and download the final trained model. The notebook is quite similar to the <a href="https://github.com/Tony607/mmdetection_object_detection_demo/blob/master/mmdetection_train_custom_coco_data.ipynb">previous object detection demo</a>, so I will let you run it and play with it.</p>
<p>Here is the final prediction result after training a mask RCNN model for 20 epochs, which took less than 10 minutes during training.</p>
<p><img alt="result2" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/49e70965a400db89ae7b83c66384651edefbceeb/images/object-detection/result2.png"/></p>
<p>Feel free to try with other model config files or tweak the existing one by increasing the training epochs, change the batch size and see how it might improve the results. Also notice that for the simplicity and the small size of the demo dataset, we skipped the train/test split, where you can accomplish that by manually split the labelme JSON files into two directories and run the <code>labelme2coco.py</code> script for each directory to generate two COCO annotation JSON files.</p>
<h2><span>Conclusion and further reading</span></h2>
<p><span>Training an instance segmentation might look daunting since doing so might require a significant amount of computing and storage resources. But that's not keeping us away from creating one with around 20 annotated images and Colab's free GPU.</span></p>
<h4><em>Resources you might find useful</em></h4>
<p><span>My GitHub repo for the <a href="https://github.com/Tony607/labelme2coco">labelme2coco </a>script, COCO image viewer notebook, and my demo dataset files.</span></p>
<p><span><a href="https://github.com/wkentaro/labelme">labelme Github repo </a>where you can find more information about the annotation tool.</span></p>
<p><span>The <a href="https://github.com/Tony607/mmdetection_instance_segmentation_demo/blob/master/mmdetection_train_custom_coco_data_segmentation.ipynb">notebook </a>you can run to train a mmdetection instance segmentation model on Google Colab.</span></p>
<p><span>Go to the <a href="https://github.com/open-mmlab/mmdetection">mmdetection GitHub repo</a> and know more about the framework.</span></p>
<p><span>My previous post - <a href="https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-object-detection/">How to train an object detection model with mmdetection</a></span></p>
<p><span></span></p>How to create custom COCO data set for object detection2019-07-16T13:27:36+00:002024-03-19T07:53:40+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-create-custom-coco-data-set-for-object-detection/<p><img alt="voc2coco" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/f2cf610f12e62571974cdcfa19898e31d006346d/images/object-detection/voc2coco.png"/></p>
<p><a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/">Previously</a>, we have trained a mmdetection model with custom annotated dataset in Pascal VOC data format. You are out of luck if your object detection training pipeline require COCO data format since the <span>labelImg</span> tool we use does not support COCO annotation format. If you still want to stick with the tool for annotation and later convert your annotation to COCO format, this post is for you.</p>
<p>We will start with a brief introduction to the two annotation formats, followed with an introduction to the conversion script to convert VOC to COCO format, finally, we will validate the converted result by plotting the bounding boxes and class labels.</p>
<h2>Pascal VOC and COCO annotations</h2>
<p>Pascal VOC annotations are saved as XML files, one XML file per image. For an XML file generated by the <span>labelImg tool. It contains the path to the image in the <path> element. Each bounding box is stored in an <object> element, an example can look like below.</span></p>
<div class="highlight">
<pre><span class="o"><</span><span class="nb">object</span><span class="o">></span>
<span class="o"><</span><span class="n">name</span><span class="o">></span><span class="n">fig</span><span class="o"></</span><span class="n">name</span><span class="o">></span>
<span class="o"><</span><span class="n">pose</span><span class="o">></span><span class="n">Unspecified</span><span class="o"></</span><span class="n">pose</span><span class="o">></span>
<span class="o"><</span><span class="n">truncated</span><span class="o">></span><span class="mi">0</span><span class="o"></</span><span class="n">truncated</span><span class="o">></span>
<span class="o"><</span><span class="n">difficult</span><span class="o">></span><span class="mi">0</span><span class="o"></</span><span class="n">difficult</span><span class="o">></span>
<span class="o"><</span><span class="n">bndbox</span><span class="o">></span>
<span class="o"><</span><span class="n">xmin</span><span class="o">></span><span class="mi">256</span><span class="o"></</span><span class="n">xmin</span><span class="o">></span>
<span class="o"><</span><span class="n">ymin</span><span class="o">></span><span class="mi">27</span><span class="o"></</span><span class="n">ymin</span><span class="o">></span>
<span class="o"><</span><span class="n">xmax</span><span class="o">></span><span class="mi">381</span><span class="o"></</span><span class="n">xmax</span><span class="o">></span>
<span class="o"><</span><span class="n">ymax</span><span class="o">></span><span class="mi">192</span><span class="o"></</span><span class="n">ymax</span><span class="o">></span>
<span class="o"></</span><span class="n">bndbox</span><span class="o">></span>
<span class="o"></</span><span class="nb">object</span><span class="o">></span>
</pre>
</div>
<p><span>As you can see the bounding box is defined by two points, the upper left and bottom right corners.</span></p>
<p><span>For the COCO data format, first of all, there is only a single JSON file for all the annotation in a dataset or one for each split of datasets(Train/Val/Test).</span></p>
<p><span>The bounding box is express as the upper left starting coordinate and the box width and height, like <code>"bbox" :[x,y,width,height]</code>.</span></p>
<p><span></span>Here is an example for the COCO data format JSON file which just contains one image as seen the top-level "images" element, 3 unique categories/classes in total seen in top-level "categories" element and 2 annotated bounding boxes for the image seen in top-level "annotations" element.</p>
<div class="highlight">
<pre><span class="p">{</span>
<span class="s">"type"</span><span class="p">:</span> <span class="s">"instances"</span><span class="p">,</span>
<span class="s">"images"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"file_name"</span><span class="p">:</span> <span class="s">"0.jpg"</span><span class="p">,</span>
<span class="s">"height"</span><span class="p">:</span> <span class="mi">600</span><span class="p">,</span>
<span class="s">"width"</span><span class="p">:</span> <span class="mi">800</span><span class="p">,</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s">"categories"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"supercategory"</span><span class="p">:</span> <span class="s">"none"</span><span class="p">,</span>
<span class="s">"name"</span><span class="p">:</span> <span class="s">"date"</span><span class="p">,</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"supercategory"</span><span class="p">:</span> <span class="s">"none"</span><span class="p">,</span>
<span class="s">"name"</span><span class="p">:</span> <span class="s">"hazelnut"</span><span class="p">,</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"supercategory"</span><span class="p">:</span> <span class="s">"none"</span><span class="p">,</span>
<span class="s">"name"</span><span class="p">:</span> <span class="s">"fig"</span><span class="p">,</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s">"annotations"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"bbox"</span><span class="p">:</span> <span class="p">[</span>
<span class="mi">100</span><span class="p">,</span>
<span class="mi">116</span><span class="p">,</span>
<span class="mi">140</span><span class="p">,</span>
<span class="mi">170</span>
<span class="p">],</span>
<span class="s">"image_id"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"segmentation"</span><span class="p">:</span> <span class="p">[],</span>
<span class="s">"ignore"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"area"</span><span class="p">:</span> <span class="mi">23800</span><span class="p">,</span>
<span class="s">"iscrowd"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"category_id"</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s">"bbox"</span><span class="p">:</span> <span class="p">[</span>
<span class="mi">321</span><span class="p">,</span>
<span class="mi">320</span><span class="p">,</span>
<span class="mi">142</span><span class="p">,</span>
<span class="mi">102</span>
<span class="p">],</span>
<span class="s">"image_id"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"segmentation"</span><span class="p">:</span> <span class="p">[],</span>
<span class="s">"ignore"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"area"</span><span class="p">:</span> <span class="mi">14484</span><span class="p">,</span>
<span class="s">"iscrowd"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"category_id"</span><span class="p">:</span> <span class="mi">0</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
</pre>
</div>
<h2><span>Convert Pascal VOC to COCO annotation</span></h2>
<p><span>Once you have some annotated XML and images files, put them in the following folder structures similar the one below,</span></p>
<pre>data<br/> └── VOC2007<br/> ├── Annotations<br/> │ ├── 0.xml<br/> │ ├── ...<br/> │ └── 9.xml<br/> └── JPEGImages<br/> ├── 0.jpg<br/> ├── ...<br/> └── 9.jpg</pre>
<p><span>Then you can run the <a href="https://github.com/Tony607/voc2coco/blob/master/voc2coco.py">voc2coco.py</a> script from my GitHub like this which will generate a COCO data formatted JSON file for you.</span></p>
<pre><span>python voc2coco.py ./data/VOC/Annotations ./data/coco/output.json</span></pre>
<p><span>Once we have the JSON file, we can visualize the COCO annotation by drawing bounding box and class labels as an overlay over the image. Open the <a href="https://github.com/Tony607/voc2coco/blob/master/COCO_Image_Viewer.ipynb">COCO_Image_Viewer.ipynb</a> in Jupyter notebook. Find the following cell inside the notebook which calls the <code>display_image</code> method to generate an SVG graph right inside the notebook.</span></p>
<div class="highlight">
<pre><span class="n">html</span> <span class="o">=</span> <span class="n">coco_dataset</span><span class="o">.</span><span class="n">display_image</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">use_url</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">IPython</span><span class="o">.</span><span class="n">display</span><span class="o">.</span><span class="n">HTML</span><span class="p">(</span><span class="n">html</span><span class="p">)</span>
</pre>
</div>
<p><span></span>The first argument is the image id, for our demo datasets, there are totally 18 images, so you can try setting it from 0 to 17.</p>
<p><img alt="vis_8" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/f2cf610f12e62571974cdcfa19898e31d006346d/images/object-detection/vis_8.png"/></p>
<h2><span>Conclusion and further reading</span></h2>
<p><span>In this quick tutorial, you have learned how you can stick with the popular <a href="https://tzutalin.github.io/labelImg/">labelImg</a><span><span><a href="https://tzutalin.github.io/labelImg/"> </a>for custom dataset annotation and later convert the Pascal VOC to COCO dataset to train an object detection model pipeline requires COCO format datasets.</span></span></span></p>
<h4><em>You might find the following links useful,</em></h4>
<p>How to train an object detection model with mmdetection - my previous post about creating custom Pascal VOC annotation files and train an object detection model with PyTorch mmdetection framework. </p>
<p><a href="http://cocodataset.org/#format-data">COCO data format</a></p>
<p><a href="https://pjreddie.com/media/files/VOC2012_doc.pdf">Pascal VOC documentation</a></p>
<p>Download <a href="https://tzutalin.github.io/labelImg/">labelImg</a> for the bounding box annotation.</p>
<p>Get the source code for this post, check out <a href="https://github.com/Tony607/voc2coco">my GitHub repo</a>.</p>How to train an object detection model with mmdetection2019-06-23T13:37:01+00:002024-03-19T11:02:12+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/<p><img alt="mmdetection_colab" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/fd0529cefd0eaaefe4415289b81a0a11dc8fc5ef/images/mmdetection/mmdetection_colab.png"/></p>
<p>A while back you have learned how to train an object detection model with TensorFlow object detection API, and Google Colab's free GPU, if you haven't, check it out in <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">the post</a>. The models in TensorFlow object detection are quite dated and missing updates for the state of the art models like Cascade RCNN and RetinaNet. While there is a counterpart for Pytorch similar to that called <a href="https://github.com/open-mmlab/mmdetection">mmdetection</a> which include more pre-trained state of the art object detection models for us to train custom data with, however setting it up requires a nontrivial amount of time spent on installing the environment, setting up the config file, and dataset in the right format. The good news is you can skip those boring stuff and jump directly into the fun part to train your model.</p>
<p>Here is an overview of how to make it happen,</p>
<p>1. Annotate some images, and make train/test split.</p>
<p>2. Run <a href="https://colab.research.google.com/github/Tony607/mmdetection_object_detection_demo/blob/master/mmdetection_train_custom_data.ipynb">the Colab notebook</a> to train your model.</p>
<h2>Step 1: <span>Annotate some images and make train/test split</span></h2>
<p><span>It is only necessary i<span>f you want to use your images instead of ones comes with<span> </span><strong><a href="https://github.com/Tony607/mmdetection_object_detection_demo">my repository</a></strong>. Start by forking <a href="https://github.com/Tony607/mmdetection_object_detection_demo">my repository</a> and delete the <code>data</code> folder in the project directory so you can start fresh with your custom data.</span></span></p>
<p>If you took your images by your phone, the image resolution might be 2K or 4K depends on your phone's setting. In that case, we will scale down the image for reduced overall dataset size, and faster training speed.</p>
<p><span>You can use <g class="gr_ gr_106 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-del replaceWithoutSep" data-gr-id="106" id="106">the </g><a href="https://github.com/Tony607/mmdetection_object_detection_demo/blob/master/resize_images.py"><g class="gr_ gr_106 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-del replaceWithoutSep" data-gr-id="106" id="106">resize_images</g>.py</a> script in the repository to resize your images.</span></p>
<p>First, save all your photos to one folder outside of the project directory so they won't get accidentally uploaded to GitHub later. Ideally, all photo came<span> </span><g class="gr_ gr_141 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="141" id="141">with</g><span><g class="gr_ gr_141 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="141" id="141"> </g></span><code>jpg</code><span><g class="gr_ gr_141 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="141" id="141"> </g></span><g class="gr_ gr_141 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="141" id="141">extension</g>. Then run this script to resize all photos, and save them to the project directory.</p>
<pre>python resize_images.py --raw-dir <photo_directory> --save-dir ./data/VOCdevkit/VOC2007/ImageSets --ext jpg --target-size "(800, 600)"</pre>
<p><span></span>You might wonder why "VOC" in the path, that is because of the annotation tool we use generates <a href="http://host.robots.ox.ac.uk/pascal/VOC/">Pascal VOC</a> formatted annotation XML files. It is not necessary to dig into the actual format of the XML file since the annotation tool handles all of that. You guessed it, that is the same tool we use <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">previously</a>, <strong><a href="https://tzutalin.github.io/labelImg/">LabelImg</a></strong>, works both on Windows and Linux. </p>
<p><a href="https://tzutalin.github.io/labelImg/">Download LabelImg </a>and open it up,</p>
<p>1. Verify "<strong>PascalVOC</strong>" is selected, that is the default annotation format. </p>
<p>2. Open your resized image folder "<code>./data/VOCdevkit/VOC2007/ImageSets</code>" for annotation.</p>
<p>3. Change save directory for the XML annotation files to "<code>./data/VOCdevkit/VOC2007/Annotations</code>".</p>
<p><img alt="labelimg" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/fd0529cefd0eaaefe4415289b81a0a11dc8fc5ef/images/mmdetection/labelimg.png"/></p>
<p><em>As usual, use shortcuts (<code>w</code>: draw<span> </span><g class="gr_ gr_128 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="128" id="128">box</g><g class="gr_ gr_129 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="129" id="129">,</g><span><g class="gr_ gr_129 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="129" id="129"> </g></span><code>d</code><g class="gr_ gr_129 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="129" id="129">:</g><span> </span>next file<g class="gr_ gr_130 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="130" id="130">,</g><span><g class="gr_ gr_130 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="130" id="130"> </g></span><code>a</code><g class="gr_ gr_130 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation multiReplace" data-gr-id="130" id="130">:</g><span> </span>previous file, etc.) to accelerate the annotation.</em></p>
<p></p>
<p>Once it is done, you will find those XML files located in <span>"</span><span></span><code>./data/VOCdevkit/VOC2007/Annotations</code><span>" folder with the same file base names as your image files.</span></p>
<p><span>For the train/test split, you are going to create two files, each one containing a list of file base names, one name per line. Those two text files will be located in the folder "<code>data/VOC2007/ImageSets/Main</code>" named <code>trainval.txt</code> and <code>test.txt</code> respectively. If you don't want to type all the file name by hand, try cd into the "<code>Annotations</code>" directory and run the shell,</span></p>
<div class="highlight">
<pre><span class="n">ls</span> <span class="o">-</span><span class="mi">1</span> <span class="o">|</span> <span class="n">sed</span> <span class="o">-</span><span class="n">e</span> <span class="s">'s/\.xml$//'</span> <span class="o">|</span> <span class="n">sort</span> <span class="o">-</span><span class="n">n</span>
</pre>
</div>
<p><span></span>That will give you a list of nicely sorted file base names, just split them into two parts, and paste into those two text files.</p>
<p>Now you have the <code>data</code> directory structure similar to this one below.</p>
<div class="highlight">
<pre><span class="n">data</span>
<span class="err">└──</span> <span class="n">VOC2007</span>
<span class="err">├──</span> <span class="n">Annotations</span>
<span class="err">│</span> <span class="err">├──</span> <span class="mf">0.</span><span class="n">xml</span>
<span class="err">│</span> <span class="err">├──</span> <span class="o">...</span>
<span class="err">│</span> <span class="err">└──</span> <span class="mf">9.</span><span class="n">xml</span>
<span class="err">├──</span> <span class="n">ImageSets</span>
<span class="err">│</span> <span class="err">└──</span> <span class="n">Main</span>
<span class="err">│</span> <span class="err">├──</span> <span class="n">test</span><span class="o">.</span><span class="n">txt</span>
<span class="err">│</span> <span class="err">└──</span> <span class="n">trainval</span><span class="o">.</span><span class="n">txt</span>
<span class="err">└──</span> <span class="n">JPEGImages</span>
<span class="err">├──</span> <span class="mf">0.j</span><span class="n">pg</span>
<span class="err">├──</span> <span class="o">...</span>
<span class="err">└──</span> <span class="mf">9.j</span><span class="n">pg</span>
</pre>
</div>
<p>Update your fork of the <a href="https://github.com/Tony607/mmdetection_object_detection_demo">GitHub repository</a> with your labeled datasets so you can clone it with Colab.</p>
<div class="highlight">
<pre><span class="n">git</span> <span class="n">add</span> <span class="o">--</span><span class="n">al</span>
<span class="n">git</span> <span class="n">commit</span> <span class="o">-</span><span class="n">m</span> <span class="s">"Update datasets"</span>
<span class="n">git</span> <span class="n">push</span>
</pre>
</div>
<p></p>
<h2>Train the model on Colab Notebook</h2>
<p>We are ready to launch the <a href="https://colab.research.google.com/github/Tony607/mmdetection_object_detection_demo/blob/master/mmdetection_train_custom_data.ipynb">Colab notebook</a> and fire up the training. Similar to TensorFlow object detection API, instead of <span>training the model from scratch, we will do transfer learning from a pre-trained</span> backbone such as resnet50 specified in the model config file.</p>
<p>The notebook allows you to select the model config and set the number of training epochs.</p>
<p>Right now, I only tested with two model configs, <strong>faster_rcnn_r50_fpn_1x, </strong>and <strong>cascade_rcnn_r50_fpn_1x</strong>, while other configs can be incorporated as demonstrated in the notebook.</p>
<p>The notebook handles several things before training the model,</p>
<ol>
<li>Installing <strong>mmdetection</strong> and its dependencies.</li>
<li>Replacing "<strong>CLASSES</strong>" in <strong>voc.py</strong> file with your custom dataset class labels.</li>
<li>Modifying your selected model config file. Things like updating the number of classes to match with your dataset, changing dataset type to <strong>VOCDataset</strong>, setting the total training epoch number and more.</li>
</ol>
<p>After that, it will re-run the <strong>mmdetection</strong> package installing script so the changes to the <strong>voc.py</strong> file will be updated to the system python packages.</p>
<div class="highlight">
<pre><span class="o">%</span><span class="n">cd</span> <span class="p">{</span><span class="n">mmdetection_dir</span><span class="p">}</span>
<span class="err">!</span><span class="n">python</span> <span class="n">setup</span><span class="o">.</span><span class="n">py</span> <span class="n">install</span>
</pre>
</div>
<p>Since your data directory resides outside of the <strong>mmdetection</strong> directory, we have the following cell in the notebook which creates a symbolic link into the project data directory.</p>
<div class="highlight">
<pre><span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="s">"data/VOCdevkit"</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">voc2007_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">project_name</span><span class="p">,</span> <span class="s">"data/VOC2007"</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s">"ln -s {} data/VOCdevkit"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">voc2007_dir</span><span class="p">))</span>
</pre>
</div>
<p>Then start the training.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">python</span> <span class="n">tools</span><span class="o">/</span><span class="n">train</span><span class="o">.</span><span class="n">py</span> <span class="p">{</span><span class="n">config_fname</span><span class="p">}</span>
</pre>
</div>
<p>The training time depends on the size of your datasets and number of training epochs, my demo takes several minutes to complete with Colab's Tesla T4 GPU.</p>
<p>After training, you can test drive the model with an image in the test set like so.</p>
<div class="highlight">
<pre><span class="o">%</span><span class="n">cd</span> <span class="p">{</span><span class="n">mmdetection_dir</span><span class="p">}</span>
<span class="kn">from</span> <span class="nn">mmcv.runner</span> <span class="kn">import</span> <span class="n">load_checkpoint</span>
<span class="kn">from</span> <span class="nn">mmdet.apis</span> <span class="kn">import</span> <span class="n">inference_detector</span><span class="p">,</span> <span class="n">show_result</span><span class="p">,</span> <span class="n">init_detector</span>
<span class="n">checkpoint_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">mmdetection_dir</span><span class="p">,</span> <span class="n">work_dir</span><span class="p">,</span> <span class="s">"latest.pth"</span><span class="p">)</span>
<span class="n">score_thr</span> <span class="o">=</span> <span class="mf">0.8</span>
<span class="c1"># build the model from a config file and a checkpoint file</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">init_detector</span><span class="p">(</span><span class="n">config_fname</span><span class="p">,</span> <span class="n">checkpoint_file</span><span class="p">)</span>
<span class="c1"># test a single image and show the results</span>
<span class="n">img</span> <span class="o">=</span> <span class="s">'data/VOCdevkit/VOC2007/JPEGImages/15.jpg'</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">inference_detector</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">img</span><span class="p">)</span>
<span class="n">show_result</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">CLASSES</span><span class="p">,</span> <span class="n">score_thr</span><span class="o">=</span><span class="n">score_thr</span><span class="p">,</span> <span class="n">out_file</span><span class="o">=</span><span class="s">"result.jpg"</span><span class="p">)</span>
<span class="c1"># Show the image with bbox overlays.</span>
<span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">Image</span>
<span class="n">Image</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">'result.jpg'</span><span class="p">)</span>
</pre>
</div>
<p>And here is the result as you expected,</p>
<p><img alt="result" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/fd0529cefd0eaaefe4415289b81a0a11dc8fc5ef/images/mmdetection/result.jpg"/></p>
<h2>Conclusion and further reading</h2>
<p>This tutorial shows you how to train a Pytorch <strong>mmdetection</strong> object detection model with your custom dataset, and minimal effort on Google Colab Notebook.</p>
<p>If you are using my GitHub repo, you probably noticed that <strong>mmdetection</strong> is included as a submodule, to update that in the future run this command.</p>
<div class="highlight">
<pre><span class="n">git</span> <span class="n">submodule</span> <span class="n">update</span> <span class="o">--</span><span class="n">recursive</span>
</pre>
</div>
<p>Considering training with another model config? You can find a list of config files <a href="https://github.com/open-mmlab/mmdetection/tree/master/configs">here</a> as well as <a href="https://github.com/open-mmlab/mmdetection/blob/master/MODEL_ZOO.md#baselines">their specs</a> such as the complexity(<strong>Mem(GB)</strong>), and accuracy(<strong>box AP</strong>). Then start by adding the config file to <strong>MODELS_CONFIG</strong> at the start of the <a href="https://colab.research.google.com/github/Tony607/mmdetection_object_detection_demo/blob/master/mmdetection_train_custom_data.ipynb">notebook</a>.</p>
<p>Resources you might find helpful,</p>
<ul>
<li><a href="https://github.com/open-mmlab/mmdetection">mmdetection</a> - GitHub repository.</li>
<li><a href="https://tzutalin.github.io/labelImg/">LabelImg</a> - The Annotation tool used in this tutorial.</li>
<li><a href="https://github.com/Tony607/mmdetection_object_detection_demo">my repository</a> for this tutorial.</li>
</ul>
<p>In future posts, we will look into benchmarking those custom trained model as well as their deployment to edge computing devices, stay tuned and happy coding!</p>How to do Transfer learning with Efficientnet2019-06-09T03:16:06+00:002024-03-19T11:04:16+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/transfer-learning-with-efficientnet/<p><img alt="transfer" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/36894ad880dc3e645513efc36cc070c4cd0d3d7c/images/efficientnet/transfer.png"/></p>
<p>In this tutorial, you will learn how to create an image classification neural network to classify your custom images. The network will be based on the latest EfficientNet, which has achieved state of the art accuracy on ImageNet while being <span class="fontstyle0">8.4x smaller </span><span class="fontstyle2">and </span><span class="fontstyle0">6.1x faster.</span></p>
<h2>Why <span>EfficientNet?</span></h2>
<p>Compared to other models achieving similar ImageNet accuracy, EfficientNet is much smaller. For example, the ResNet50 model as you can see in Keras application has 23,534,592 parameters in total, and even though, it still underperforms the smallest EfficientNet, which only takes <span>5,330,564 parameters in total.</span></p>
<p><span>Why is it so efficient? To answer the question, we will dive into its base model and building block. You might have heard of the building block for the classical ResNet model is identity and convolution block.</span></p>
<p><span><span class="fontstyle0">For EfficientNet, its main building block is mobile <strong>inverted bottleneck</strong> MBConv, which was first introduced in <a href="https://arxiv.org/abs/1801.04381">MobileNetV2</a>. By using<span class="fontstyle0"> shortcuts directly between the bottlenecks</span> which connects a much fewer number of channels compared to expansion layers<span class="fontstyle0">, combined with <strong>d</strong></span></span></span><strong>epthwise separable convolution</strong> which e<span class="fontstyle0">ffectively reduces computation by almost a factor of </span><span class="fontstyle2">k</span><span class="fontstyle3"><sup>2</sup>, compared to traditional layers. Where k stands for the kernel size, specifying the height and width of the 2D convolution window.</span></p>
<p><img alt="building_blocks" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/36894ad880dc3e645513efc36cc070c4cd0d3d7c/images/efficientnet/building_blocks.png"/></p>
<p><span><span class="fontstyle0">T<span class="fontstyle0">he authors also add <a href="https://arxiv.org/abs/1709.01507">squeeze-and-excitation</a>(SE) optimization, which contributes to further <span class="fontstyle0">performance improvements.</span></span><br/> </span><span class="fontstyle0">The second benefit of EfficientNet, it scales more efficiently by carefully balancing network depth, width, and resolution, which lead to better performance.</span></span></p>
<p><img alt="size_vs_accuracy" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/36894ad880dc3e645513efc36cc070c4cd0d3d7c/images/efficientnet/size_vs_accuracy.png"/></p>
<p><span><span class="fontstyle0">As you can see, starting from the smallest EfficientNet configuration B0 to the largest B7, accuracies are steady increasing while maintaining a relatively small size.</span></span></p>
<h2><span><span class="fontstyle0">Transfer Learning with EfficientNet</span></span></h2>
<p><span><span class="fontstyle0">It is fine if you are not entirely sure what I am talking about in the previous section. Transfer learning for image classification is more or less model agnostic. You can pick any other pre-trained ImageNet model such as MobileNetV2 or ResNet50 as a <span class="fontstyle0">drop-in replacement</span> if you want.<br/> </span></span></p>
<p><span><span class="fontstyle0">A pre-trained network is simply a saved network previously trained on a large dataset such as ImageNet. The learned features can prove useful for many different computer vision problems, even though these new problems might involve completely different classes from those of the original task. <span>For instance, one might train a network on ImageNet (where classes are mostly animals and everyday objects) and then re-purpose this trained network for something as remote as identifying the <a href="https://ai.stanford.edu/~jkrause/cars/car_dataset.html">car models</a> in images. For this tutorial, we expect the model to perform well on our cat vs. dog classification problem with a relatively small number of samples.</span></span></span></p>
<p>The easiest way to get started is by opening <a href="https://github.com/Tony607/efficientnet_keras_transfer_learning/blob/master/Keras_efficientnet_transfer_learning.ipynb">this notebook</a> in Colab, while I will explain more detail here in this post.</p>
<p>First clone my repository which contains the Tensorflow Keras implementation of the EfficientNet, then cd into the directory.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">git</span> <span class="n">clone</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">Tony607</span><span class="o">/</span><span class="n">efficientnet_keras_transfer_learning</span>
<span class="o">%</span><span class="n">cd</span> <span class="n">efficientnet_keras_transfer_learning</span><span class="o">/</span>
</pre>
</div>
<p>The EfficientNet is built for ImageNet classification contains 1000 classes labels. For our dataset, we only have 2. Which means the last few layers for classification is not useful for us. They can be excluded while loading the model by specifying the <code>include_top</code> argument to False, and this applies to other ImageNet models made available in <a href="https://keras.io/applications/">Keras applications</a> as well.</p>
<div class="highlight">
<pre><span class="c1"># Options: EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3</span>
<span class="c1"># Higher the number, the more complex the model is.</span>
<span class="kn">from</span> <span class="nn">efficientnet</span> <span class="kn">import</span> <span class="n">EfficientNetB0</span> <span class="k">as</span> <span class="n">Net</span>
<span class="kn">from</span> <span class="nn">efficientnet</span> <span class="kn">import</span> <span class="n">center_crop_and_resize</span><span class="p">,</span> <span class="n">preprocess_input</span>
<span class="c1"># loading pretrained conv base model</span>
<span class="n">conv_base</span> <span class="o">=</span> <span class="n">Net</span><span class="p">(</span><span class="n">weights</span><span class="o">=</span><span class="s">"imagenet"</span><span class="p">,</span> <span class="n">include_top</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="n">input_shape</span><span class="p">)</span>
</pre>
</div>
<p>To create our own classification layers stack on top of the EfficientNet convolutional base model. We adapt <code>GlobalMaxPooling2D</code> to convert 4D the <span><code>(batch_size, rows, cols, channels)</code> tensor into 2D tensor with shape <code>(batch_size, channels)</code>. <code>GlobalMaxPooling2D</code> results in a much smaller number of features compared to the <code>Flatten</code> layer, which effectively reduces the number of parameters. </span></p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras</span> <span class="kn">import</span> <span class="n">layers</span>
<span class="n">dropout_rate</span> <span class="o">=</span> <span class="mf">0.2</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">conv_base</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">layers</span><span class="o">.</span><span class="n">GlobalMaxPooling2D</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">"gap"</span><span class="p">))</span>
<span class="c1"># model.add(layers.Flatten(name="flatten"))</span>
<span class="k">if</span> <span class="n">dropout_rate</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">layers</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_rate</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"dropout_out"</span><span class="p">))</span>
<span class="c1"># model.add(layers.Dense(256, activation='relu', name="fc1"))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">"softmax"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"fc_out"</span><span class="p">))</span>
</pre>
</div>
<p>To keep the convolutional base's weight untouched, we will freeze it, otherwise, the representations previously learned from the ImageNet dataset will be destroyed.</p>
<div class="highlight">
<pre><span class="n">conv_base</span><span class="o">.</span><span class="n">trainable</span> <span class="o">=</span> <span class="bp">False</span>
</pre>
</div>
<p>Then you can download and unzip the <code>dog_vs_cat</code> data from Microsoft.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">wget</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">download</span><span class="o">.</span><span class="n">microsoft</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">download</span><span class="o">/</span><span class="mi">3</span><span class="o">/</span><span class="n">E</span><span class="o">/</span><span class="mi">1</span><span class="o">/</span><span class="mf">3E1</span><span class="n">C3F21</span><span class="o">-</span><span class="n">ECDB</span><span class="o">-</span><span class="mi">4869</span><span class="o">-</span><span class="mi">8368</span><span class="o">-</span><span class="mi">6</span><span class="n">DEBA77B919F</span><span class="o">/</span><span class="n">kagglecatsanddogs_3367a</span><span class="o">.</span><span class="n">zip</span>
<span class="err">!</span><span class="n">unzip</span> <span class="o">-</span><span class="n">qq</span> <span class="n">kagglecatsanddogs_3367a</span><span class="o">.</span><span class="n">zip</span> <span class="o">-</span><span class="n">d</span> <span class="n">dog_vs_cat</span>
</pre>
</div>
<p>There are several blocks of data in <a href="https://github.com/Tony607/efficientnet_keras_transfer_learning/blob/master/Keras_efficientnet_transfer_learning.ipynb">the Notebook</a> dedicated to sample a subset of images from the original dataset to form train/validation/test sets after which you will see.</p>
<div class="highlight">
<pre><span class="n">total</span> <span class="n">training</span> <span class="n">cat</span> <span class="n">images</span><span class="p">:</span> <span class="mi">1000</span>
<span class="n">total</span> <span class="n">training</span> <span class="n">dog</span> <span class="n">images</span><span class="p">:</span> <span class="mi">1000</span>
<span class="n">total</span> <span class="n">validation</span> <span class="n">cat</span> <span class="n">images</span><span class="p">:</span> <span class="mi">500</span>
<span class="n">total</span> <span class="n">validation</span> <span class="n">dog</span> <span class="n">images</span><span class="p">:</span> <span class="mi">500</span>
<span class="n">total</span> <span class="n">test</span> <span class="n">cat</span> <span class="n">images</span><span class="p">:</span> <span class="mi">500</span>
<span class="n">total</span> <span class="n">test</span> <span class="n">dog</span> <span class="n">images</span><span class="p">:</span> <span class="mi">500</span>
</pre>
</div>
<p>Then you can compile and train the model with Keras's <code>ImageDataGenerator</code>, which adds various data augmentation options during the training to reduce the chance of overfitting.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.preprocessing.image</span> <span class="kn">import</span> <span class="n">ImageDataGenerator</span>
<span class="n">train_datagen</span> <span class="o">=</span> <span class="n">ImageDataGenerator</span><span class="p">(</span>
<span class="n">rescale</span><span class="o">=</span><span class="mf">1.0</span> <span class="o">/</span> <span class="mi">255</span><span class="p">,</span>
<span class="n">rotation_range</span><span class="o">=</span><span class="mi">40</span><span class="p">,</span>
<span class="n">width_shift_range</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
<span class="n">height_shift_range</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
<span class="n">shear_range</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
<span class="n">zoom_range</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
<span class="n">horizontal_flip</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">fill_mode</span><span class="o">=</span><span class="s">"nearest"</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Note that the validation data should not be augmented!</span>
<span class="n">test_datagen</span> <span class="o">=</span> <span class="n">ImageDataGenerator</span><span class="p">(</span><span class="n">rescale</span><span class="o">=</span><span class="mf">1.0</span> <span class="o">/</span> <span class="mi">255</span><span class="p">)</span>
<span class="n">train_generator</span> <span class="o">=</span> <span class="n">train_datagen</span><span class="o">.</span><span class="n">flow_from_directory</span><span class="p">(</span>
<span class="c1"># This is the target directory</span>
<span class="n">train_dir</span><span class="p">,</span>
<span class="c1"># All images will be resized to target height and width.</span>
<span class="n">target_size</span><span class="o">=</span><span class="p">(</span><span class="n">height</span><span class="p">,</span> <span class="n">width</span><span class="p">),</span>
<span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span>
<span class="c1"># Since we use categorical_crossentropy loss, we need categorical labels</span>
<span class="n">class_mode</span><span class="o">=</span><span class="s">"categorical"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">validation_generator</span> <span class="o">=</span> <span class="n">test_datagen</span><span class="o">.</span><span class="n">flow_from_directory</span><span class="p">(</span>
<span class="n">validation_dir</span><span class="p">,</span>
<span class="n">target_size</span><span class="o">=</span><span class="p">(</span><span class="n">height</span><span class="p">,</span> <span class="n">width</span><span class="p">),</span>
<span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span>
<span class="n">class_mode</span><span class="o">=</span><span class="s">"categorical"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">"categorical_crossentropy"</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">optimizers</span><span class="o">.</span><span class="n">RMSprop</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="mf">2e-5</span><span class="p">),</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">"acc"</span><span class="p">],</span>
<span class="p">)</span>
<span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">fit_generator</span><span class="p">(</span>
<span class="n">train_generator</span><span class="p">,</span>
<span class="n">steps_per_epoch</span><span class="o">=</span><span class="n">NUM_TRAIN</span> <span class="o">//</span> <span class="n">batch_size</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="n">epochs</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="n">validation_generator</span><span class="p">,</span>
<span class="n">validation_steps</span><span class="o">=</span><span class="n">NUM_TEST</span> <span class="o">//</span> <span class="n">batch_size</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">use_multiprocessing</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">workers</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
<span class="p">)</span>
</pre>
</div>
<p>Another technique to make the model representation more relevant for the problem at hand is called fine-tuning. That is based on the following intuition.</p>
<p><span>Earlier layers in the convolutional base encode more generic, reusable features, while layers higher up encode more specialized features.</span></p>
<p>The steps for fine-tuning a network are as follow:</p>
<ul>
<li>1) Add your custom network on top of an already trained base network.</li>
<li>2) Freeze the base network.</li>
<li>3) Train the part you added.</li>
<li>4) Unfreeze some layers in the base network.</li>
<li>5) Jointly train both these layers and the part you added.</li>
</ul>
<p>We have already done the first three steps, to find out which layers to unfreeze, it is helpful to plot the Keras model.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.utils</span> <span class="kn">import</span> <span class="n">plot_model</span>
<span class="n">plot_model</span><span class="p">(</span><span class="n">conv_base</span><span class="p">,</span> <span class="n">to_file</span><span class="o">=</span><span class="s">'conv_base.png'</span><span class="p">,</span> <span class="n">show_shapes</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">Image</span>
<span class="n">Image</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">'conv_base.png'</span><span class="p">)</span>
</pre>
</div>
<p>Here is the zoom in view of the last several layers in the convolutional base model. </p>
<p><img alt="fine_tuning" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/36894ad880dc3e645513efc36cc070c4cd0d3d7c/images/efficientnet/fine_tuning.png"/></p>
<p>To set '<code>multiply_16</code>' and successive layers trainable.</p>
<div class="highlight">
<pre><span class="n">conv_base</span><span class="o">.</span><span class="n">trainable</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">set_trainable</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="n">conv_base</span><span class="o">.</span><span class="n">layers</span><span class="p">:</span>
<span class="k">if</span> <span class="n">layer</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s">'multiply_16'</span><span class="p">:</span>
<span class="n">set_trainable</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">if</span> <span class="n">set_trainable</span><span class="p">:</span>
<span class="n">layer</span><span class="o">.</span><span class="n">trainable</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">layer</span><span class="o">.</span><span class="n">trainable</span> <span class="o">=</span> <span class="bp">False</span>
</pre>
</div>
<p>Then you can compile and train the model again for some more epochs. Finally, you will have a fine-tuned model with a 9% increase in validation accuracy.</p>
<h2><span>Conclusion and Further reading</span></h2>
<p>This post starts with a brief introduction to EfficientNet and why its more efficient compare to classical ResNet model. An example is made runnable on Colab Notebook showing you how to build a model reusing the convolutional base of EfficientNet and fine-tuning last several layers on the custom dataset.</p>
<p><span>The full source code is available on <a href="https://github.com/Tony607/efficientnet_keras_transfer_learning">my GitHub repo</a>.</span></p>
<h4><span>You might find the following resources helpful.</span></h4>
<p><a href="https://arxiv.org/abs/1905.11946">EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks</a></p>
<p><a href="https://arxiv.org/abs/1801.04381">MobileNetV2: Inverted Residuals and Linear Bottlenecks</a></p>
<p><a href="https://arxiv.org/abs/1709.01507">Squeeze-and-Excitation Networks</a></p>
<p><a href="https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet">TensorFlow implementation of EfficientNet</a></p>
<p><span></span></p>How to compress your Keras model x5 smaller with TensorFlow model optimization2019-05-19T12:13:11+00:002024-03-19T06:54:38+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-compress-your-keras-model-x5-smaller-with-tensorflow-model-optimization/<p><img alt="prune" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/18832addd5a67dd1f16942d4545a318534846b65/images/tf2/prune.png"/></p>
<p>This tutorial will demonstrate how you can reduce the size of your Keras model by 5 times with <a href="https://www.tensorflow.org/model_optimization">TensorFlow model optimization</a>, which can be particularly important for deployment in resource-constraint environments.</p>
<p><span>From the official TensorFlow model optimization documentation. </span><span>Weight pruning means eliminating unnecessary values in weight tensors. We set the neural network parameters' values to zero to remove what we estimate are unnecessary connections between the layers of a neural network. This is done during the training process to allow the neural network to adapt to the changes.</span></p>
<p>Here is a breakdown of how you can adopt this technique.</p>
<ol>
<li>Train Keras model to reach an acceptable accuracy as always.</li>
<li>Make Keras layers or model ready to be pruned.</li>
<li>Create a pruning schedule and train the model for more epochs.</li>
<li>Export the pruned model by striping pruning wrappers from the model.</li>
<li>Convert Keras model to TensorFlow Lite with optional quantization.</li>
</ol>
<h2>Prune your pre-trained Keras model</h2>
<p>Your pre-trained model has already achieved desirable accuracy, you want to cut down its size while maintaining the performance. The <span>pruning API can help you make it happen.</span></p>
<p><span>To use the pruning API, install the <code>tensorflow-model-optimization</code><span><span> </span>and<span> </span></span><code>tf-nightly</code><span><span> </span>packages.</span></span></p>
<div class="highlight">
<pre><span class="n">pip</span> <span class="n">uninstall</span> <span class="o">-</span><span class="n">yq</span> <span class="n">tensorflow</span>
<span class="n">pip</span> <span class="n">uninstall</span> <span class="o">-</span><span class="n">yq</span> <span class="n">tf</span><span class="o">-</span><span class="n">nightly</span>
<span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">Uq</span> <span class="n">tf</span><span class="o">-</span><span class="n">nightly</span><span class="o">-</span><span class="n">gpu</span>
<span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">tensorflow</span><span class="o">-</span><span class="n">model</span><span class="o">-</span><span class="n">optimization</span>
</pre>
</div>
<p>Then you can load your previous trained model and make it "prunable". <span>The Keras-based API can be applied at the level of individual layers, or the entire model. Since you have the entire model pre-trained, it is easier to apply the pruning to the entire model. The algorithm will be applied to all layers capable of weight pruning.</span></p>
<p><span>For the pruning schedule, we start at the sparsity level 50% and gradually train the model to reach 90% sparsity. X% sparsity means that X% of the weight tensor is going to be pruned away.</span></p>
<p><span>Furthermore, we give the model some time to recover after each pruning step, so pruning does not happen on every step. We set the pruning <code>frequency</code> to 100. Similar to pruning a bonsai, we are trimming it gradually so that the tree can adequately heal the wound created during pruning instead of cutting 90% of its branches in one day.</span></p>
<p><span>Given the model already reached a satisfactory accuracy, we can start pruning immediately. As a result, we set the <code>begin_step</code> to 0 here, and only train for another four epochs.</span></p>
<p><span>The end step is calculated given the number of train example, batch size, and the total epochs to train.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">tensorflow</span> <span class="kn">as</span> <span class="nn">tf</span>
<span class="kn">from</span> <span class="nn">tensorflow_model_optimization.sparsity</span> <span class="kn">import</span> <span class="n">keras</span> <span class="k">as</span> <span class="n">sparsity</span>
<span class="c1"># Backend agnostic way to save/restore models</span>
<span class="c1"># _, keras_file = tempfile.mkstemp('.h5')</span>
<span class="c1"># print('Saving model to: ', keras_file)</span>
<span class="c1"># tf.keras.models.save_model(model, keras_file, include_optimizer=False)</span>
<span class="c1"># Load the serialized model</span>
<span class="n">loaded_model</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">load_model</span><span class="p">(</span><span class="n">keras_file</span><span class="p">)</span>
<span class="n">epochs</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">end_step</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">*</span> <span class="n">num_train_samples</span> <span class="o">/</span> <span class="n">batch_size</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span> <span class="o">*</span> <span class="n">epochs</span>
<span class="k">print</span><span class="p">(</span><span class="n">end_step</span><span class="p">)</span>
<span class="n">new_pruning_params</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">'pruning_schedule'</span><span class="p">:</span> <span class="n">sparsity</span><span class="o">.</span><span class="n">PolynomialDecay</span><span class="p">(</span><span class="n">initial_sparsity</span><span class="o">=</span><span class="mf">0.50</span><span class="p">,</span>
<span class="n">final_sparsity</span><span class="o">=</span><span class="mf">0.90</span><span class="p">,</span>
<span class="n">begin_step</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">end_step</span><span class="o">=</span><span class="n">end_step</span><span class="p">,</span>
<span class="n">frequency</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">new_pruned_model</span> <span class="o">=</span> <span class="n">sparsity</span><span class="o">.</span><span class="n">prune_low_magnitude</span><span class="p">(</span><span class="n">loaded_model</span><span class="p">,</span> <span class="o">**</span><span class="n">new_pruning_params</span><span class="p">)</span>
<span class="n">new_pruned_model</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
<span class="n">new_pruned_model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
</pre>
</div>
<p>Don't panic if you find more trainable parameters in the <code>new_pruned_model</code> summary, those came from the <span>pruning wrappers which we will remove later.</span></p>
<p>Now let's start the training and pruning model.</p>
<div class="highlight">
<pre><span class="c1"># Add a pruning step callback to peg the pruning step to the optimizer's</span>
<span class="c1"># step. Also add a callback to add pruning summaries to tensorboard</span>
<span class="n">callbacks</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">sparsity</span><span class="o">.</span><span class="n">UpdatePruningStep</span><span class="p">(),</span>
<span class="n">sparsity</span><span class="o">.</span><span class="n">PruningSummaries</span><span class="p">(</span><span class="n">log_dir</span><span class="o">=</span><span class="n">logdir</span><span class="p">,</span> <span class="n">profile_batch</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="p">]</span>
<span class="n">new_pruned_model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="n">epochs</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">callbacks</span><span class="o">=</span><span class="n">callbacks</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
<span class="n">score</span> <span class="o">=</span> <span class="n">new_pruned_model</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test loss:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test accuracy:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</pre>
</div>
<p>The test loss and accuracy of the pruned model should look similar to your original Keras model.</p>
<h2>Export the pruned model</h2>
<p><span>Those </span><span>pruning wrappers can be removed easily like this, after which the total number of parameters should be the same as your original model.</span></p>
<div class="highlight">
<pre><span class="n">final_model</span> <span class="o">=</span> <span class="n">sparsity</span><span class="o">.</span><span class="n">strip_pruning</span><span class="p">(</span><span class="n">pruned_model</span><span class="p">)</span>
<span class="n">final_model</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
</pre>
</div>
<p><span>Now you can check the percentage of weights were pruned by comparing them to zero.</span></p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">load_model</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">load_model</span><span class="p">(</span><span class="n">final_model</span><span class="p">)</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">w</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">get_weights</span><span class="p">()):</span>
<span class="k">print</span><span class="p">(</span>
<span class="s">"{} -- Total:{}, Zeros: {:.2f}%"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">model</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">w</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">w</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="o">/</span> <span class="n">w</span><span class="o">.</span><span class="n">size</span> <span class="o">*</span> <span class="mi">100</span>
<span class="p">)</span>
<span class="p">)</span>
</pre>
</div>
<p><span></span>Here is the results, as you can see, 90% of convolution, dense and batch norm layers' weights are pruned.</p>
<table class="table table-striped">
<tbody>
<tr>
<td width="64">name</td>
<td width="64">Total para</td>
<td width="115">Pruned%</td>
</tr>
<tr>
<td>conv2d_2/kernel:0</td>
<td>800</td>
<td>89.12%</td>
</tr>
<tr>
<td>conv2d_2/bias:0</td>
<td>32</td>
<td>0.00%</td>
</tr>
<tr>
<td>batch_normalization_1/gamma:0</td>
<td>32</td>
<td>0.00%</td>
</tr>
<tr>
<td>batch_normalization_1/beta:0</td>
<td>32</td>
<td>0.00%</td>
</tr>
<tr>
<td>conv2d_3/kernel:0</td>
<td>32</td>
<td>0.00%</td>
</tr>
<tr>
<td>conv2d_3/bias:0</td>
<td>32</td>
<td>0.00%</td>
</tr>
<tr>
<td>dense_2/kernel:0</td>
<td>51200</td>
<td>89.09%</td>
</tr>
<tr>
<td>dense_2/bias:0</td>
<td>64</td>
<td>0.00%</td>
</tr>
<tr>
<td>dense_3/kernel:0</td>
<td>3211264</td>
<td>89.09%</td>
</tr>
<tr>
<td>dense_3/bias:0</td>
<td>1024</td>
<td>0.00%</td>
</tr>
<tr>
<td>batch_normalization_1/moving_mean:0</td>
<td>10240</td>
<td>89.09%</td>
</tr>
<tr>
<td>batch_normalization_1/moving_variance:0</td>
<td>10</td>
<td>0.00%</td>
</tr>
</tbody>
</table>
<p><span>Now, simply using a generic file compression algorithm (e.g. zip), the Keras model will be reduced by x5 times.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tempfile</span>
<span class="kn">import</span> <span class="nn">zipfile</span>
<span class="n">_</span><span class="p">,</span> <span class="n">new_pruned_keras_file</span> <span class="o">=</span> <span class="n">tempfile</span><span class="o">.</span><span class="n">mkstemp</span><span class="p">(</span><span class="s">".h5"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Saving pruned model to: "</span><span class="p">,</span> <span class="n">new_pruned_keras_file</span><span class="p">)</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">save_model</span><span class="p">(</span><span class="n">final_model</span><span class="p">,</span> <span class="n">new_pruned_keras_file</span><span class="p">,</span> <span class="n">include_optimizer</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="c1"># Zip the .h5 model file</span>
<span class="n">_</span><span class="p">,</span> <span class="n">zip3</span> <span class="o">=</span> <span class="n">tempfile</span><span class="o">.</span><span class="n">mkstemp</span><span class="p">(</span><span class="s">".zip"</span><span class="p">)</span>
<span class="k">with</span> <span class="n">zipfile</span><span class="o">.</span><span class="n">ZipFile</span><span class="p">(</span><span class="n">zip3</span><span class="p">,</span> <span class="s">"w"</span><span class="p">,</span> <span class="n">compression</span><span class="o">=</span><span class="n">zipfile</span><span class="o">.</span><span class="n">ZIP_DEFLATED</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">new_pruned_keras_file</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span>
<span class="s">"Size of the pruned model before compression: </span><span class="si">%.2f</span><span class="s"> Mb"</span>
<span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">new_pruned_keras_file</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="mi">2</span> <span class="o">**</span> <span class="mi">20</span><span class="p">))</span>
<span class="p">)</span>
<span class="k">print</span><span class="p">(</span>
<span class="s">"Size of the pruned model after compression: </span><span class="si">%.2f</span><span class="s"> Mb"</span>
<span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">zip3</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="mi">2</span> <span class="o">**</span> <span class="mi">20</span><span class="p">))</span>
<span class="p">)</span>
</pre>
</div>
<p>Here is what you get, x5 times smaller model.</p>
<p>Size of the pruned model before compression: <strong>12.52 Mb</strong><br/>Size of the pruned model after compression: <strong>2.51 Mb</strong></p>
<h2><span>Convert Keras model to TensorFlow Lite</span></h2>
<p><span>Tensorflow Lite is an example format you can use to deploy to mobile devices. To convert to a Tensorflow Lite graph, it is necessary to use the <code>TFLiteConverter</code> as below:</span></p>
<div class="highlight">
<pre><span class="c1"># Create the .tflite file</span>
<span class="n">tflite_model_file</span> <span class="o">=</span> <span class="s">"/tmp/sparse_mnist.tflite"</span>
<span class="n">converter</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">TFLiteConverter</span><span class="o">.</span><span class="n">from_keras_model_file</span><span class="p">(</span><span class="n">pruned_keras_file</span><span class="p">)</span>
<span class="n">tflite_model</span> <span class="o">=</span> <span class="n">converter</span><span class="o">.</span><span class="n">convert</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">tflite_model_file</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">tflite_model</span><span class="p">)</span>
</pre>
</div>
<p><span>Then you can use a similar technique to zip the <code>tflite</code> file and reduce size x5 times smaller.</span></p>
<p>Post-training quantization <span>converts weights to 8-bit precision as part of the model conversion from keras model to TFLite's flat buffer, resulting in another 4x reduction in the model size. Just add the following line to the previous snippet before calling the <code>convert()</code>.</span></p>
<div class="highlight">
<pre><span class="n">converter</span><span class="o">.</span><span class="n">optimizations</span> <span class="o">=</span> <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">Optimize</span><span class="o">.</span><span class="n">OPTIMIZE_FOR_SIZE</span><span class="p">]</span>
</pre>
</div>
<p><span>The compressed 8-bit tensorflow lite model only takes 0.60 Mb compared to the original Keras model's 12.52 Mb while maintaining comparable test accuracy. That's totally x16 times size reduction.</span></p>
<p><span>You can evaluate the accuracy of the converted TensorFlow Lite model like this where you feed the <code>eval_model</code> with the test dataset.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="n">interpreter</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">Interpreter</span><span class="p">(</span><span class="n">model_path</span><span class="o">=</span><span class="nb">str</span><span class="p">(</span><span class="n">tflite_model_file</span><span class="p">))</span>
<span class="n">interpreter</span><span class="o">.</span><span class="n">allocate_tensors</span><span class="p">()</span>
<span class="n">input_index</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_input_details</span><span class="p">()[</span><span class="mi">0</span><span class="p">][</span><span class="s">"index"</span><span class="p">]</span>
<span class="n">output_index</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_output_details</span><span class="p">()[</span><span class="mi">0</span><span class="p">][</span><span class="s">"index"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">eval_model</span><span class="p">(</span><span class="n">interpreter</span><span class="p">,</span> <span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">):</span>
<span class="n">total_seen</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">num_correct</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">img</span><span class="p">,</span> <span class="n">label</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">):</span>
<span class="n">inp</span> <span class="o">=</span> <span class="n">img</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">total_seen</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">interpreter</span><span class="o">.</span><span class="n">set_tensor</span><span class="p">(</span><span class="n">input_index</span><span class="p">,</span> <span class="n">inp</span><span class="p">)</span>
<span class="n">interpreter</span><span class="o">.</span><span class="n">invoke</span><span class="p">()</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_tensor</span><span class="p">(</span><span class="n">output_index</span><span class="p">)</span>
<span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">label</span><span class="p">):</span>
<span class="n">num_correct</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">total_seen</span> <span class="o">%</span> <span class="mi">1000</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Accuracy after </span><span class="si">%i</span><span class="s"> images: </span><span class="si">%f</span><span class="s">"</span> <span class="o">%</span>
<span class="p">(</span><span class="n">total_seen</span><span class="p">,</span> <span class="nb">float</span><span class="p">(</span><span class="n">num_correct</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">total_seen</span><span class="p">)))</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">num_correct</span><span class="p">)</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">total_seen</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">eval_model</span><span class="p">(</span><span class="n">interpreter</span><span class="p">,</span> <span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
</pre>
</div>
<p></p>
<h2>Conclusion and Further reading</h2>
<p><span>In this tutorial, we showed you how to create </span><em>sparse models</em><span><span> </span>with the TensorFlow model optimization toolkit weight pruning API. <span>Right now, this allows you to create models that take significantly less space on the disk. The resulting model can also be more efficiently implemented to avoid computation; in the future, TensorFlow Lite will provide such capabilities.</span></span></p>
<p><span><span>Check out the official <a href="https://www.tensorflow.org/model_optimization">TensorFlow model optimization</a> page and their <a href="https://github.com/tensorflow/model-optimization">GitHub page</a> for more information.</span></span></p>
<h4><em>The source code for this post is available on <a href="https://github.com/Tony607/prune-keras">my Github</a> and runnable on <a href="https://colab.research.google.com/github/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/g3doc/guide/pruning/pruning_with_keras.ipynb">Google Colab Notebook</a>.</em></h4>How to run Tensorboard for PyTorch 1.1.0 inside Jupyter notebook2019-05-09T10:57:05+00:002024-03-19T09:53:30+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-tensorboard-for-pytorch-110-inside-jupyter-notebook/<p><img alt="tb_pytorch_nb" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8da97ba29a0c9b8ac91df0a793351099d9bdc681/images/pytorch/tb_pytorch_nb.png"/></p>
<p>Facebook introduced PyTorch 1.1 with TensorBoard support. Let's try it out really quickly on Colab's Jupyter Notebook.</p>
<p>Not need to install anything locally on your development machine. Google's Colab cames in handy free of charge even with its upgraded Tesla T4 GPU.</p>
<p>Firstly, let's create a <a href="https://colab.research.google.com">Colab notebook</a> or open <strong><a href="https://github.com/Tony607/pytorch-tensorboard/blob/master/PyTorch_1_1_0_tensorboard.ipynb">this one I made</a></strong>.</p>
<p>Type in the first cell to check the version of PyTorch is at minimal 1.1.0</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">torch</span>
<span class="n">torch</span><span class="o">.</span><span class="n">__version__</span>
</pre>
</div>
<p>Then you are going to install the cutting edge TensorBoard build like this.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">tb</span><span class="o">-</span><span class="n">nightly</span>
</pre>
</div>
<p>The output might remind you to restart the runtime to make the new TensorBoard take effect. You can click through <code>Runtime -> Restart runtime...</code>.</p>
<p>Next, load the TensorBoard notebook extension with this magic line.</p>
<div class="highlight">
<pre><span class="o">%</span><span class="n">load_ext</span> <span class="n">tensorboard</span>
</pre>
</div>
<p>After which you can start by exploring the <a href="https://pytorch.org/docs/stable/tensorboard.html">TORCH.UTILS.TENSORBOARD</a> API, these <span>utilities let you log PyTorch models and metrics into a directory for visualization within the TensorBoard UI. Scalars, images, histograms, graphs, and embedding visualizations are all supported for PyTorch models and tensors.</span></p>
<p><span>The <code>SummaryWriter</code> class is your main entry to log data for consumption and visualization by TensorBoard. Let's run this official demo for MNIST dataset and ResNet50 model.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">torch</span>
<span class="kn">import</span> <span class="nn">torchvision</span>
<span class="kn">from</span> <span class="nn">torch.utils.tensorboard</span> <span class="kn">import</span> <span class="n">SummaryWriter</span>
<span class="kn">from</span> <span class="nn">torchvision</span> <span class="kn">import</span> <span class="n">datasets</span><span class="p">,</span> <span class="n">transforms</span>
<span class="c1"># Writer will output to ./runs/ directory by default</span>
<span class="n">writer</span> <span class="o">=</span> <span class="n">SummaryWriter</span><span class="p">()</span>
<span class="n">transform</span> <span class="o">=</span> <span class="n">transforms</span><span class="o">.</span><span class="n">Compose</span><span class="p">([</span><span class="n">transforms</span><span class="o">.</span><span class="n">ToTensor</span><span class="p">(),</span> <span class="n">transforms</span><span class="o">.</span><span class="n">Normalize</span><span class="p">((</span><span class="mf">0.5</span><span class="p">,),</span> <span class="p">(</span><span class="mf">0.5</span><span class="p">,))])</span>
<span class="n">trainset</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">MNIST</span><span class="p">(</span><span class="s">'mnist_train'</span><span class="p">,</span> <span class="n">train</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">download</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="n">transform</span><span class="p">)</span>
<span class="n">trainloader</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">DataLoader</span><span class="p">(</span><span class="n">trainset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">torchvision</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">resnet50</span><span class="p">(</span><span class="bp">False</span><span class="p">)</span>
<span class="c1"># Have ResNet model take in grayscale rather than RGB</span>
<span class="n">model</span><span class="o">.</span><span class="n">conv1</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Conv2d</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">stride</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">padding</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">images</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="nb">iter</span><span class="p">(</span><span class="n">trainloader</span><span class="p">))</span>
<span class="n">grid</span> <span class="o">=</span> <span class="n">torchvision</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">make_grid</span><span class="p">(</span><span class="n">images</span><span class="p">)</span>
<span class="n">writer</span><span class="o">.</span><span class="n">add_image</span><span class="p">(</span><span class="s">'images'</span><span class="p">,</span> <span class="n">grid</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">writer</span><span class="o">.</span><span class="n">add_graph</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">images</span><span class="p">)</span>
<span class="n">writer</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre>
</div>
<p><span></span>You just wrote an image and the model graph data to TensorBoard summary. The writer wrote the output file to "./runs" directory by default.</p>
<p>Let's run the TensorBoard to visualize them</p>
<div class="highlight">
<pre><span class="o">%</span><span class="n">tensorboard</span> <span class="o">--</span><span class="n">logdir</span><span class="o">=</span><span class="n">runs</span>
</pre>
</div>
<p>That's it, you have it!</p>
<p><img alt="tb_pytorch" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8da97ba29a0c9b8ac91df0a793351099d9bdc681/images/pytorch/tb_pytorch.png"/></p>
<h2>Summary and Further reading</h2>
<p>This really short tutorial gets you to start with running TensorBoard with latest Pytorch 1.1.0 in a Jupyter Notebook. Keep playing around with other features supported with PyTorch TensorBoard.</p>
<p>Read the official API document here - <a href="https://pytorch.org/docs/stable/tensorboard.html">TORCH.UTILS.TENSORBOARD</a></p>How to run Keras model on RK3399Pro2019-05-02T13:09:02+00:002024-03-19T06:07:55+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-keras-model-on-rk3399pro/<p><img alt="keras-tb" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/15e507f6c3c47cf316b89096f5f5720269625894/images/rk3399pro/keras-tb.png"/></p>
<p>Previously, we have introduced and benchmarked several embedded edge computing solutions, including <a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">OpenVINO</a> for Intel Intel neural compute sticks, <a href="https://www.dlology.com/blog/how-to-run-deep-learning-model-on-microcontroller-with-cmsis-nn/">CMSIS-NN</a> for ARM microcontrollers and TensorRT model on <a href="https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/">Jetson Nano</a>.</p>
<p>What they have in common is each hardware provider has their own tools and API to quantize a TensorFlow graph and combine adjacent layers to accelerate inferencing.</p>
<p>This time we will take a look at the RockChip RK3399Pro SoC with builtin NPU(Neural Compute Unit) rated to inference at 2.4TOPs at 8 bits precision, which is capable of running Inception V3 model at a speed over 28 FPS. You will see deploying a Keras model to the board is quite similar to previously mentioned solutions.</p>
<ol>
<li>Freeze Keras model to TensorFlow graph and creates inference model with RKNN Toolkit.</li>
<li>Load the<span> RKNN</span><span> model</span> on an <span>RK3399Pro dev board</span> and make predictions.</li>
</ol>
<p>Let's get started with the first time setup.</p>
<h2>Setup <span>RK3399Pro board</span></h2>
<p>Any dev board with an RK3399Pro SoC like the <a href="https://www.amazon.com/Toybrick-Development-Artificial-Intelligence-Acceleration/dp/B07P3M7683">Rockchip Toybrick RK3399PRO Board</a> or the <a href="http://shop.t-firefly.com/goods.php?id=98">Firefly Core-3399Pro </a>should work. I have a Rockchip Toybrick RK3399PRO Board with 6GB RAM(2GB dedicated for NPU).</p>
<p>The board came with many connectors and interfaces similar to the Jetson Nano. One thing worth mentioning, the HDMI connector doesn't work with my monitor, however, I am able to get a USB Type-C to HDMI adapter working.</p>
<p><img alt="tb-rk3399pro" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/15e507f6c3c47cf316b89096f5f5720269625894/images/rk3399pro/tb-rk3399pro.png"/></p>
<p>It has Fedora Linux release 28 preinstalled with the default username and passwords "toybrick".</p>
<p>The <span>RK3399Pro has 6 core 64-bits CPUs with <a href="https://en.wikipedia.org/wiki/ARM_architecture#AArch64">aarch64 architecture</a> same architecture as <a href="https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-nano/">Jetson Nano</a> but quite different from Raspberry 3B+ which is ARMv7 32-bit only. This means any precompiled python wheel packages target Raspberry Pi will not likely work with RK3399Pro or Jetson Nano. But don't be despair, you can download the precompiled aarch64 python wheel package files from my <a href="https://coding.net/u/zcw607/p/aarch64_python_packages/git">aarch64_python_packages</a> repo including scipy, onnx, tensorflow and rknn_toolkit from their<a href="https://github.com/rockchip-toybrick/RKNPUTool/tree/master/rknn-toolkit/package"> official GitHub</a>.</span></p>
<p><span>Transfer those wheel files to the RK3399Pro board then run the following command.</span></p>
<pre><span>sudo dnf update -y<br/>sudo dnf install -y cmake gcc gcc-c++ protobuf-devel protobuf-compiler lapack-devel<br/>sudo dnf install -y python3-devel python3-opencv python3-numpy-f2py python3-h5py python3-lmdb<br/>sudo dnf install -y python3-grpcio<br/><br/>sudo pip3 install scipy-1.2.0-cp36-cp36m-linux_aarch64.whl<br/>sudo pip3 install onnx-1.4.1-cp36-cp36m-linux_aarch64.whl<br/>sudo pip3 install tensorflow-1.10.1-cp36-cp36m-linux_aarch64.whl<br/>sudo pip3 install rknn_toolkit-0.9.9-cp36-cp36m-linux_aarch64.whl<br/></span></pre>
<div>
<div>This might take a while depending on your internet connection speed.<br/><span></span></div>
</div>
<div>
<h2><span>Step1: Freeze Keras model and convert to RKNN model</span></h2>
<p><span>The conversion from TensorFlow graph to RKNN model will take considerable time if you choose to run on the development board.</span> So it is recommended to get a Linux development machine which could be the Windows WSL, an Ubuntu VM or even <a href="https://colab.research.google.com">Google Colab</a>.</p>
<p>Setup your development for the first time, you can find the rknn toolkit <span>wheel package files from their<a href="https://github.com/rockchip-toybrick/RKNPUTool/tree/master/rknn-toolkit/package"> official GitHub</a>.</span></p>
<div class="highlight">
<pre><span class="n">pip3</span> <span class="n">install</span> <span class="o">-</span><span class="n">U</span> <span class="n">tensorflow</span> <span class="n">scipy</span> <span class="n">onnx</span>
<span class="n">pip3</span> <span class="n">install</span> <span class="n">rknn_toolkit</span><span class="o">-</span><span class="mf">0.9</span><span class="o">.</span><span class="mi">9</span><span class="o">-</span><span class="n">cp36</span><span class="o">-</span><span class="n">cp36m</span><span class="o">-</span><span class="n">linux_x86_64</span><span class="o">.</span><span class="n">whl</span>
<span class="c1"># Or if you have Python 3.5</span>
<span class="c1"># pip3 install rknn_toolkit-0.9.9-cp35-cp35m-linux_x86_64.whl</span>
</pre>
</div>
<p>Frozen a Keras model to a single <code>.pb</code> file is similar to previous tutorials. You can find the code in <a href="https://github.com/Tony607/Keras_RK3399pro/freeze_graph.py">freeze_graph.py</a> on GitHub. Once it is done, you will have an ImageNet InceptionV3 frozen model <span>accepts inputs with shape </span><span><code>(N, 299, 299, 3)</code>.</span></p>
<p><span>Take notes of the input and output node names since we will specify they when loading the frozen model with RKNN toolkit. For the InceptionV3 and many other <a href="https://keras.io/applications/">Keras ImageNet models</a> they will be,</span></p>
<div class="highlight">
<pre><span class="n">INPUT_NODE</span><span class="p">:</span> <span class="p">[</span><span class="s">'input_1'</span><span class="p">]</span>
<span class="n">OUTPUT_NODE</span><span class="p">:</span> <span class="p">[</span><span class="s">'predictions/Softmax'</span><span class="p">]</span></pre>
</div>
<p><span>Then you can run the <a href="https://github.com/Tony607/Keras_RK3399pro/convert_rknn.py">convert_rknn.py</a> script to quantize your model to the uint8 data type or more specifically asymmetric quantized uint8 type. </span></p>
<p>For asymmetric quantization, the quantized range is fully utilized vs the symmetric mode. That is because we exactly map the min/max values from the float range to the min/max of the quantized range. Below is an illustration of the two range-based linear quantization methods. You can read more about it <a href="https://nervanasystems.github.io/distiller/algo_quantization.html">here</a>.</p>
<p><img alt="" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/15e507f6c3c47cf316b89096f5f5720269625894/images/rk3399pro/asymmetric-mode.png"/></p>
<p>The <code>rknn.config</code> also allows you to specify the <code>channel_mean_value</code> with a list of 4 values <code>(M0, M1, M2, S0)</code> as a way to automatically normalize the image data with uint8(0~255) data type to different ranges in the inference pipeline. Keras ImageNet models with TensorFlow backend expect the image data values normalized between -1 to 1. To accomplish this, we set the <code>channel_mean_value</code> to <code>"128 128 128 128"</code> where the first three values are mean values for each of the RGB color channels, the last value is a scale parameter. The output data is calculated as follows.</p>
<div>
<div>
<div class="highlight">
<pre><span class="n">R_out</span> <span class="o">=</span> <span class="p">(</span><span class="n">R</span> <span class="o">-</span> <span class="n">M0</span><span class="p">)</span><span class="o">/</span><span class="n">S0</span>
<span class="n">G_out</span> <span class="o">=</span> <span class="p">(</span><span class="n">G</span> <span class="o">-</span> <span class="n">M1</span><span class="p">)</span><span class="o">/</span><span class="n">S0</span>
<span class="n">B_out</span> <span class="o">=</span> <span class="p">(</span><span class="n">B</span> <span class="o">-</span> <span class="n">M2</span><span class="p">)</span><span class="o">/</span><span class="n">S0</span>
</pre>
</div>
</div>
</div>
<p><span></span>If you use Python OpenCV to read or capture images, the color channel is in BGR order, in that case, you can set the <code>reorder_channel</code> parameter of <code>rknn.config()</code> to <code>"2 1 0"</code> so the color channels will be reordered to RGB in the inference pipeline.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">rknn.api</span> <span class="kn">import</span> <span class="n">RKNN</span>
<span class="n">INPUT_NODE</span> <span class="o">=</span> <span class="p">[</span><span class="s">"input_1"</span><span class="p">]</span>
<span class="n">OUTPUT_NODE</span> <span class="o">=</span> <span class="p">[</span><span class="s">"predictions/Softmax"</span><span class="p">]</span>
<span class="n">img_height</span> <span class="o">=</span> <span class="mi">299</span>
<span class="c1"># Create RKNN object</span>
<span class="n">rknn</span> <span class="o">=</span> <span class="n">RKNN</span><span class="p">()</span>
<span class="c1"># pre-process config</span>
<span class="c1"># channel_mean_value "0 0 0 255" while normalize the image data to range [0, 1]</span>
<span class="c1"># channel_mean_value "128 128 128 128" while normalize the image data to range [-1, 1]</span>
<span class="c1"># reorder_channel "0 1 2" will keep the color channel, "2 1 0" will swap the R and B channel,</span>
<span class="c1"># i.e. if the input is BGR loaded by cv2.imread, it will convert it to RGB for the model input.</span>
<span class="c1"># need_horizontal_merge is suggested for inception models (v1/v3/v4).</span>
<span class="n">rknn</span><span class="o">.</span><span class="n">config</span><span class="p">(</span>
<span class="n">channel_mean_value</span><span class="o">=</span><span class="s">"128 128 128 128"</span><span class="p">,</span>
<span class="n">reorder_channel</span><span class="o">=</span><span class="s">"0 1 2"</span><span class="p">,</span>
<span class="n">need_horizontal_merge</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">quantized_dtype</span><span class="o">=</span><span class="s">"asymmetric_quantized-u8"</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># Load tensorflow model</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">load_tensorflow</span><span class="p">(</span>
<span class="n">tf_pb</span><span class="o">=</span><span class="s">"./model/frozen_model.pb"</span><span class="p">,</span>
<span class="n">inputs</span><span class="o">=</span><span class="n">INPUT_NODE</span><span class="p">,</span>
<span class="n">outputs</span><span class="o">=</span><span class="n">OUTPUT_NODE</span><span class="p">,</span>
<span class="n">input_size_list</span><span class="o">=</span><span class="p">[[</span><span class="n">img_height</span><span class="p">,</span> <span class="n">img_height</span><span class="p">,</span> <span class="mi">3</span><span class="p">]],</span>
<span class="p">)</span>
<span class="k">if</span> <span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Load inception_v3 failed!"</span><span class="p">)</span>
<span class="nb">exit</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
<span class="c1"># Build model</span>
<span class="c1"># dataset: A input data set for rectifying quantization parameters.</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">do_quantization</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">dataset</span><span class="o">=</span><span class="s">"./dataset.txt"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Build inception_v3 failed!"</span><span class="p">)</span>
<span class="nb">exit</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
<span class="c1"># Export rknn model</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">export_rknn</span><span class="p">(</span><span class="s">"./inception_v3.rknn"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Export inception_v3.rknn failed!"</span><span class="p">)</span>
<span class="nb">exit</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
</pre>
</div>
<div>After running the script you will have the <code>inception_v3.rknn</code> in the project directory, transfer the file to the dev board for inference.</div>
<div>
<h2><span>Step 2: Loads RKNN</span><span> model and make predictions</span></h2>
<p><span>The inference pipeline takes care of stuff including image normalization and color channel reordering as configured in the previous step. What's left for you is loading the model, initializing the runtime environment and running the inference.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">cv2</span>
<span class="kn">from</span> <span class="nn">rknn.api</span> <span class="kn">import</span> <span class="n">RKNN</span>
<span class="c1"># Create RKNN object</span>
<span class="n">rknn</span> <span class="o">=</span> <span class="n">RKNN</span><span class="p">()</span>
<span class="n">img_height</span> <span class="o">=</span> <span class="mi">299</span>
<span class="c1"># Direct Load RKNN Model</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">load_rknn</span><span class="p">(</span><span class="s">"./inception_v3.rknn"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Load inception_v3.rknn failed!"</span><span class="p">)</span>
<span class="nb">exit</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
<span class="c1"># Set inputs</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">imread</span><span class="p">(</span><span class="s">"./data/elephant.jpg"</span><span class="p">)</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">dsize</span><span class="o">=</span><span class="p">(</span><span class="n">img_height</span><span class="p">,</span> <span class="n">img_height</span><span class="p">),</span> <span class="n">interpolation</span><span class="o">=</span><span class="n">cv2</span><span class="o">.</span><span class="n">INTER_CUBIC</span><span class="p">)</span>
<span class="c1"># This can opt out if "reorder_channel" is set to "2 1 0"</span>
<span class="c1"># rknn.config() in `convert_rknn.py`</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_BGR2RGB</span><span class="p">)</span>
<span class="c1"># init runtime environment</span>
<span class="k">print</span><span class="p">(</span><span class="s">"--> Init runtime environment"</span><span class="p">)</span>
<span class="n">ret</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">init_runtime</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="s">"rk3399pro"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">ret</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Init runtime environment failed"</span><span class="p">)</span>
<span class="nb">exit</span><span class="p">(</span><span class="n">ret</span><span class="p">)</span>
<span class="c1"># Inference</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">inference</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">img</span><span class="p">])</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">outputs</span><span class="p">)</span>
<span class="n">rknn</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</pre>
</div>
</div>
<div><span>The outputs shape is (1, 1, 1000) representing 1000 classes' logits.</span></div>
<div><span></span></div>
<div>
<h2><span>Benchmark results</span></h2>
<p><span>Benchmark setting.</span></p>
<ul>
<li><span>Model: Inception V3</span></li>
<li><span>Quantization: uint8</span></li>
<li><span>Input size: (1, 499, 499, 3)</span></li>
</ul>
<p><span></span></p>
<p><span>Let's run the inferencing several times and see how fast it can go.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">time</span>
<span class="n">times</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># Run inference 20 times and do the average.</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="c1"># Use the API internal call directly.</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">rknn</span><span class="o">.</span><span class="n">rknn_base</span><span class="o">.</span><span class="n">inference</span><span class="p">(</span>
<span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">img</span><span class="p">],</span> <span class="n">data_type</span><span class="o">=</span><span class="s">"uint8"</span><span class="p">,</span> <span class="n">data_format</span><span class="o">=</span><span class="s">"nhwc"</span><span class="p">,</span> <span class="n">outputs</span><span class="o">=</span><span class="bp">None</span>
<span class="p">)</span>
<span class="c1"># Alternatively, use the external API call.</span>
<span class="c1"># outputs = rknn.inference(inputs=[img])</span>
<span class="n">delta</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span>
<span class="n">times</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">delta</span><span class="p">)</span>
<span class="c1"># Calculate the average time for inference.</span>
<span class="n">mean_delta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">times</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">fps</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">mean_delta</span>
<span class="k">print</span><span class="p">(</span><span class="s">"average(sec):{:.3f},fps:{:.2f}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">mean_delta</span><span class="p">,</span> <span class="n">fps</span><span class="p">))</span>
</pre>
</div>
<p><span></span><span>It achieves an average <strong>FPS of 28.94</strong>, even faster than <a href="https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/">Jetson Nano</a>'s 27.18 FPS running a much smaller MobileNetV2 model.</span></p>
<h2>Conclusion and Further reading</h2>
<p>This post shows you how to get started with an RK3399Pro dev board, convert and run a Keras image classification on its NPU in real-time speed.</p>
<h4><em>For the complete source code, check out <a href="https://github.com/Tony607/Keras_RK3399pro">my GitHub repository</a>.</em></h4>
</div>
</div>How to run TensorFlow Object Detection model on Jetson Nano2019-04-21T02:39:02+00:002024-03-19T10:52:18+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-tensorflow-object-detection-model-on-jetson-nano/<p><img alt="tf-jetson-nano" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/47b40972ff8c670987d4b0a3a1faa093de07e4dc/images/jetson/tf-jetson-nano.png"/></p>
<p>Previously, you have learned how to run a Keras image classification model on Jetson Nano, this time you will know how to run a Tensorflow object detection model on it. It could be a pre-trained model in <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">Tensorflow detection model zoo</a> which detects everyday object like person/car/dog, or it could be a custom trained object detection model which detects your custom objects.</p>
<p>For this tutorial, we will convert the SSD MobileNet V1 model trained on coco dataset for common object detection.</p>
<p><span>Here is a break down how to make it happen, slightly different from the previous image classification tutorial.</span></p>
<ol>
<li>Download pre-trained model checkpoint, build TensorFlow detection graph then creates inference graph with TensorRT.</li>
<li>Loads the<span> </span><span>TensorRT inference graph</span> on Jetson Nano and make predictions.</li>
</ol>
<p>Those two steps will be handled in two separate Jupyter Notebook, with the first one running on a development machine and second one running on the Jetson Nano.</p>
<p>Before going any further make sure you have <a href="https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/">setup</a> Jetson Nano and installed Tensorflow.</p>
<h2>Step 1: Create TensorRT model</h2>
<p><span>Run this step on your development machine with </span><a href="https://github.com/tensorflow/tensorrt#installing-tf-trt">Tensorflow nightly builds</a> <span>which include TF-TRT by default or you can run on<span> </span></span><a href="https://colab.research.google.com/github/Tony607/tf_jetson_nano/blob/master/Step1_Object_detection_Colab_TensorRT.ipynb">this Colab notebook</a><span>'s free GPU.</span></p>
<p>In the notebook, you will start with installing Tensorflow Object Detection API and setting up relevant paths. Its <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md">official installing documentation</a> might look daunting to beginners, but you can also do it by running just one notebook cell.</p>
<div class="highlight">
<pre><span class="o">%</span><span class="n">cd</span> <span class="o">/</span><span class="n">content</span>
<span class="err">!</span><span class="n">git</span> <span class="n">clone</span> <span class="o">--</span><span class="n">quiet</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">tensorflow</span><span class="o">/</span><span class="n">models</span><span class="o">.</span><span class="n">git</span>
<span class="err">!</span><span class="n">apt</span><span class="o">-</span><span class="n">get</span> <span class="n">install</span> <span class="o">-</span><span class="n">qq</span> <span class="n">protobuf</span><span class="o">-</span><span class="n">compiler</span> <span class="n">python</span><span class="o">-</span><span class="n">pil</span> <span class="n">python</span><span class="o">-</span><span class="n">lxml</span> <span class="n">python</span><span class="o">-</span><span class="n">tk</span>
<span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">Cython</span> <span class="n">contextlib2</span> <span class="n">pillow</span> <span class="n">lxml</span> <span class="n">matplotlib</span>
<span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">pycocotools</span>
<span class="o">%</span><span class="n">cd</span> <span class="o">/</span><span class="n">content</span><span class="o">/</span><span class="n">models</span><span class="o">/</span><span class="n">research</span>
<span class="err">!</span><span class="n">protoc</span> <span class="n">object_detection</span><span class="o">/</span><span class="n">protos</span><span class="o">/*.</span><span class="n">proto</span> <span class="o">--</span><span class="n">python_out</span><span class="o">=.</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s">'PYTHONPATH'</span><span class="p">]</span> <span class="o">+=</span> <span class="s">':/content/models/research/:/content/models/research/slim/'</span>
<span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s">"/content/models/research/slim/"</span><span class="p">)</span>
<span class="err">!</span><span class="n">python</span> <span class="n">object_detection</span><span class="o">/</span><span class="n">builders</span><span class="o">/</span><span class="n">model_builder_test</span><span class="o">.</span><span class="n">py</span>
</pre>
</div>
<p>Next, you will download and build a detection graph from the pre-trained <strong>ssd_mobilenet_v1_coco</strong> checkpoint or select another one from the list provided in the Notebook.</p>
<div class="highlight">
<pre><span class="n">config_path</span><span class="p">,</span> <span class="n">checkpoint_path</span> <span class="o">=</span> <span class="n">download_detection_model</span><span class="p">(</span><span class="n">MODEL</span><span class="p">,</span> <span class="s">'data'</span><span class="p">)</span>
<span class="n">frozen_graph</span><span class="p">,</span> <span class="n">input_names</span><span class="p">,</span> <span class="n">output_names</span> <span class="o">=</span> <span class="n">build_detection_graph</span><span class="p">(</span>
<span class="n">config</span><span class="o">=</span><span class="n">config_path</span><span class="p">,</span>
<span class="n">checkpoint</span><span class="o">=</span><span class="n">checkpoint_path</span><span class="p">,</span>
<span class="n">score_threshold</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span>
<span class="n">iou_threshold</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span>
<span class="p">)</span>
</pre>
</div>
<p>Initially, the default Tensorflow object detection model takes variable batch size, it is now fixed to 1 since the Jetson Nano is a resource-constrained device. In the <code>build_detection_graph</code> call, several other changes apply to the Tensorflow graph,</p>
<ul>
<li>The score threshold is set to 0.3, so the model will remove any prediction results with confidence score lower than the threshold.</li>
<li>IoU(intersection over union) threshold is set to 0.5 so that any detected objects with same classes overlapped will be removed. You can read more about IoU(<span>intersection over union</span>) and non-max suppression <a href="https://www.dlology.com/blog/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2/">here</a>.</li>
<li>Apply modifications over the frozen object detection graph for improved speed and reduced memory consumption.</li>
</ul>
<p>Next, we create a TensorRT inference graph just like the image classification model.</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tensorflow.contrib.tensorrt</span> <span class="kn">as</span> <span class="nn">trt</span>
<span class="n">trt_graph</span> <span class="o">=</span> <span class="n">trt</span><span class="o">.</span><span class="n">create_inference_graph</span><span class="p">(</span>
<span class="n">input_graph_def</span><span class="o">=</span><span class="n">frozen_graph</span><span class="p">,</span>
<span class="n">outputs</span><span class="o">=</span><span class="n">output_names</span><span class="p">,</span>
<span class="n">max_batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">max_workspace_size_bytes</span><span class="o">=</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">25</span><span class="p">,</span>
<span class="n">precision_mode</span><span class="o">=</span><span class="s">'FP16'</span><span class="p">,</span>
<span class="n">minimum_segment_size</span><span class="o">=</span><span class="mi">50</span>
<span class="p">)</span>
</pre>
</div>
<p>Once you have the TensorRT inference graph, you can save it as <strong>pb</strong> file and download from Colab <span>or your local machine into your Jetson Nano</span> as necessary.</p>
<div class="highlight">
<pre><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'./data/trt_graph.pb'</span><span class="p">,</span> <span class="s">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">trt_graph</span><span class="o">.</span><span class="n">SerializeToString</span><span class="p">())</span>
<span class="c1"># Download the tensorRT graph .pb file from colab to your local machine.</span>
<span class="kn">from</span> <span class="nn">google.colab</span> <span class="kn">import</span> <span class="n">files</span>
<span class="n">files</span><span class="o">.</span><span class="n">download</span><span class="p">(</span><span class="s">'./data/trt_graph.pb'</span><span class="p">)</span>
</pre>
</div>
<p></p>
<h2>Step 2: Loads TensorRT graph and make predictions</h2>
<p><span>On your Jetson Nano, start a Jupyter Notebook with command </span><code>jupyter notebook --ip=0.0.0.0</code><span><span> </span>where you have saved the downloaded graph file to </span><code>./model/trt_graph.pb</code><span>. The following code will load the TensorRT graph and make it ready for inferencing.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="kn">as</span> <span class="nn">tf</span>
<span class="k">def</span> <span class="nf">get_frozen_graph</span><span class="p">(</span><span class="n">graph_file</span><span class="p">):</span>
<span class="sd">"""Read Frozen Graph file from disk."""</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">gfile</span><span class="o">.</span><span class="n">FastGFile</span><span class="p">(</span><span class="n">graph_file</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">graph_def</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">GraphDef</span><span class="p">()</span>
<span class="n">graph_def</span><span class="o">.</span><span class="n">ParseFromString</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">return</span> <span class="n">graph_def</span>
<span class="c1"># The TensorRT inference graph file downloaded from Colab or your local machine.</span>
<span class="n">pb_fname</span> <span class="o">=</span> <span class="s">"./model/trt_graph.pb"</span>
<span class="n">trt_graph</span> <span class="o">=</span> <span class="n">get_frozen_graph</span><span class="p">(</span><span class="n">pb_fname</span><span class="p">)</span>
<span class="n">input_names</span> <span class="o">=</span> <span class="p">[</span><span class="s">'image_tensor'</span><span class="p">]</span>
<span class="c1"># Create session and load graph</span>
<span class="n">tf_config</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">ConfigProto</span><span class="p">()</span>
<span class="n">tf_config</span><span class="o">.</span><span class="n">gpu_options</span><span class="o">.</span><span class="n">allow_growth</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">tf_sess</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">Session</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="n">tf_config</span><span class="p">)</span>
<span class="n">tf</span><span class="o">.</span><span class="n">import_graph_def</span><span class="p">(</span><span class="n">trt_graph</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">''</span><span class="p">)</span>
<span class="n">tf_input</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="n">input_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="s">':0'</span><span class="p">)</span>
<span class="n">tf_scores</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="s">'detection_scores:0'</span><span class="p">)</span>
<span class="n">tf_boxes</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="s">'detection_boxes:0'</span><span class="p">)</span>
<span class="n">tf_classes</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="s">'detection_classes:0'</span><span class="p">)</span>
<span class="n">tf_num_detections</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="s">'num_detections:0'</span><span class="p">)</span>
</pre>
</div>
<p><span>Now, we can make a prediction with an image and see if the model gets it correctly. Notice we resized the image to 300 x 300, however, you can try other sizes or just keep the size unmodified since the graph can handle variable-sized input. But keep in mind, since the memory in Jetson is quite tiny compared to a desktop machine so it can hardly take large images.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">cv2</span>
<span class="n">IMAGE_PATH</span> <span class="o">=</span> <span class="s">"./data/dogs.jpg"</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">imread</span><span class="p">(</span><span class="n">IMAGE_PATH</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="mi">300</span><span class="p">,</span> <span class="mi">300</span><span class="p">))</span>
<span class="n">scores</span><span class="p">,</span> <span class="n">boxes</span><span class="p">,</span> <span class="n">classes</span><span class="p">,</span> <span class="n">num_detections</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="n">tf_scores</span><span class="p">,</span> <span class="n">tf_boxes</span><span class="p">,</span> <span class="n">tf_classes</span><span class="p">,</span> <span class="n">tf_num_detections</span><span class="p">],</span> <span class="n">feed_dict</span><span class="o">=</span><span class="p">{</span>
<span class="n">tf_input</span><span class="p">:</span> <span class="n">image</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="o">...</span><span class="p">]</span>
<span class="p">})</span>
<span class="n">boxes</span> <span class="o">=</span> <span class="n">boxes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># index by 0 to remove batch dimension</span>
<span class="n">scores</span> <span class="o">=</span> <span class="n">scores</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">classes</span> <span class="o">=</span> <span class="n">classes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">num_detections</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">num_detections</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
</pre>
</div>
<p>If you have played around Tensorflow object detection API before, those outputs should look familiar.</p>
<p>Here the results might still contain overlapped predictions with different class labels. For example, the same object can be labeled with two classes in two overlapping bound boxes.</p>
<p>We will use a custom non-max suppression function to remove the overlapping bounding boxes with lower prediction score.</p>
<p>Let's visualize the result by drawing bounding boxes and labels overlays.</p>
<p>Here is the code to create the overlays and display on the Jetson Nano's Notebook.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">Image</span> <span class="k">as</span> <span class="n">DisplayImage</span>
<span class="c1"># Boxes unit in pixels (image coordinates).</span>
<span class="n">boxes_pixels</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_detections</span><span class="p">):</span>
<span class="c1"># scale box to image coordinates</span>
<span class="n">box</span> <span class="o">=</span> <span class="n">boxes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="n">image</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
<span class="n">image</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">image</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">image</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]])</span>
<span class="n">box</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">box</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="n">boxes_pixels</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">box</span><span class="p">)</span>
<span class="n">boxes_pixels</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">boxes_pixels</span><span class="p">)</span>
<span class="c1"># Remove overlapping boxes with non-max suppression, return picked indexes.</span>
<span class="n">pick</span> <span class="o">=</span> <span class="n">non_max_suppression</span><span class="p">(</span><span class="n">boxes_pixels</span><span class="p">,</span> <span class="n">scores</span><span class="p">[:</span><span class="n">num_detections</span><span class="p">],</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">pick</span><span class="p">:</span>
<span class="n">box</span> <span class="o">=</span> <span class="n">boxes_pixels</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">box</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">box</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="c1"># Draw bounding box.</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">rectangle</span><span class="p">(</span>
<span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">box</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">box</span><span class="p">[</span><span class="mi">2</span><span class="p">]),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">label</span> <span class="o">=</span> <span class="s">"{}:{:.2f}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">classes</span><span class="p">[</span><span class="n">i</span><span class="p">]),</span> <span class="n">scores</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="c1"># Draw label (class index and probability).</span>
<span class="n">draw_label</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="p">(</span><span class="n">box</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">box</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">label</span><span class="p">)</span>
<span class="c1"># Save and display the labeled image.</span>
<span class="n">save_image</span><span class="p">(</span><span class="n">image</span><span class="p">[:,</span> <span class="p">:,</span> <span class="p">::</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="n">DisplayImage</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">"./data/img.png"</span><span class="p">)</span>
</pre>
</div>
<p><img alt="results" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/47b40972ff8c670987d4b0a3a1faa093de07e4dc/images/jetson/results.png"/></p>
<p>In coco label map, class 18 means a dog and 23 is a bear. The two dogs sitting there are incorrectly classified as bears. Maybe there are more sitting bears than standing dogs in coco datasets.</p>
<p>A similar speed benchmark is carried out and Jetson Nano has achieved <strong>11.54 FPS</strong> with the <span>SSD MobileNet V1 model and 300 x 300 input image.</span></p>
<p><span>If you run into out of memory issue, try to boot up the board without any monitor attached and log into the shell with SSH so you can save some memory from the GUI.</span></p>
<h2>Conclusion and further reading</h2>
<p>In this tutorial, you learned how to convert a Tensorflow object detection model and run the inference on Jetson Nano.</p>
<h4><em>Check out the updated <a href="https://github.com/Tony607/tf_jetson_nano">GitHub repo</a> for the source code.</em></h4>
<p>If you are not satisfied with the results, there are other pre-trained models for you to take a look at, I recommend you start with SSD MobileNet V2(ssd_mobilenet_v2_coco), or if you are adventurous, try <strong>ssd_inception_v2_coco</strong> which might push the limits of Jetson Nano's memory.</p>
<p>You can find those models in <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">Tensorflow detection model zoo</a>, the "Speed (ms)" metric will give you a guideline on the complexity of the model.</p>
<p>Thinking about training your custom object detection model with a free data center GPU, check out my previous tutorial - <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">How to train an object detection model easy for free</a>.</p>
<p></p>How to run Keras model on Jetson Nano2019-04-13T10:03:44+00:002024-03-19T09:29:37+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/<p><img alt="keras-jetson-nano" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/995232447efb1156e88c59a773cf7f64fa202a8b/images/jetson/keras-jetson-nano.png"/></p>
<p><a href="https://developer.nvidia.com/embedded/buy/jetson-nano-devkit" rel="noopener" target="_blank">Jetson Nano Developer Kit</a> announced <span>at the 2019 GTC for $99 brings a new rival to the arena of edge computing hardware alongside its more pricy predecessors, Jetson TX1 and TX2. The coming of Jetson Nano gives the company a competitive advantage over other affordable options, to name a few, <a href="https://software.intel.com/en-us/movidius-ncs">Movidius neural compute stick</a>, <a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">Intel Graphics running OpenVINO</a> and <a href="https://cloud.google.com/edge-tpu/">Google edge TPU</a>.</span></p>
<p><span>In this post, I will show you how to run a Keras model on the Jetson Nano.</span></p>
<p><span>Here is a break down of how to make it happen.</span></p>
<ol>
<li>Freeze Keras model to TensorFlow graph then creates inference graph with TensorRT.</li>
<li>Loads the <span>TensorRT inference graph</span> on Jetson Nano and make predictions.</li>
</ol>
<p>We will do the first step on a development machine since it is computational and resource intensive way beyond what Jetson Nano can handle.</p>
<p>Let's get started.</p>
<h2>Setup Jetson Nano</h2>
<p>Follow the <a href="https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit">official getting started guide</a> to flash the latest SD card image, setup, and boot.</p>
<p>One thing to keep in mind, Jetson Nano doesn't come with WIFI radio as the latest Raspberry Pi does, so it is recommended to have a USB WIFI dongle like <a href="https://www.amazon.com/dp/B003MTTJOY//ref=cm_sw_su_dp">this one</a> ready unless you plan to hardwire its ethernet jack instead.</p>
<h3>Install TensorFlow on Jetson Nano</h3>
<p>There is <a href="https://devtalk.nvidia.com/default/topic/1048776/jetson-nano/official-tensorflow-for-jetson-nano-/2">a thread</a> on the Nvidia developer forum about official support of TensorFlow on Jetson Nano, here is a quick run down how you can install it.</p>
<p>Start a terminal or SSH to your Jetson Nano, then run those commands.</p>
<div class="highlight">
<pre><span class="n">sudo</span> <span class="n">apt</span> <span class="n">update</span>
<span class="n">sudo</span> <span class="n">apt</span> <span class="n">install</span> <span class="n">python3</span><span class="o">-</span><span class="n">pip</span> <span class="n">libhdf5</span><span class="o">-</span><span class="n">serial</span><span class="o">-</span><span class="n">dev</span> <span class="n">hdf5</span><span class="o">-</span><span class="n">tools</span>
<span class="n">pip3</span> <span class="n">install</span> <span class="o">--</span><span class="n">extra</span><span class="o">-</span><span class="n">index</span><span class="o">-</span><span class="n">url</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">developer</span><span class="o">.</span><span class="n">download</span><span class="o">.</span><span class="n">nvidia</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">compute</span><span class="o">/</span><span class="n">redist</span><span class="o">/</span><span class="n">jp</span><span class="o">/</span><span class="n">v42</span> <span class="n">tensorflow</span><span class="o">-</span><span class="n">gpu</span><span class="o">==</span><span class="mf">1.13</span><span class="o">.</span><span class="mi">1</span><span class="o">+</span><span class="n">nv19</span><span class="o">.</span><span class="mi">3</span> <span class="o">--</span><span class="n">user</span>
</pre>
</div>
<p>In case you get into the error below,</p>
<pre>Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel</pre>
<p>Try run</p>
<pre>sudo apt install python3.6-dev</pre>
<p>The Python3 might gets updated to a later version in the future. You can always check your version first with <strong>python3 --version</strong>, and change the previous command accordingly.</p>
<p>It is also helpful to install Jupyter Notebook so you can remotely connect to it from a development machine.</p>
<pre>pip3 install jupyter</pre>
<p>Also, notice that Python OpenCV version 3.3.1 is already installed which ease a lot of pain from cross compiling. You can verify this by importing the <strong>cv2</strong> library from the Python3 command line interface.</p>
<h2>Step 1: Freeze Keras model and convert into TensorRT model</h2>
<p>Run this step on your development machine with <a href="https://github.com/tensorflow/tensorrt#installing-tf-trt">Tensorflow nightly builds</a> which include TF-TRT by default or run on <a href="https://colab.research.google.com/github/Tony607/tf_jetson_nano/blob/master/Step1_Colab_TensorRT.ipynb">this Colab notebook</a>'s free GPU.</p>
<p>First lets loads a Keras model. For this tutorial, we use pre-trained MobileNetV2 came with Keras, feel free to replace it with your custom model when necessary.</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.applications.mobilenet_v2</span> <span class="kn">import</span> <span class="n">MobileNetV2</span> <span class="k">as</span> <span class="n">Net</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Net</span><span class="p">(</span><span class="n">weights</span><span class="o">=</span><span class="s">'imagenet'</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="s">'./model'</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="c1"># Save the h5 file to path specified.</span>
<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s">"./model/model.h5"</span><span class="p">)</span>
</pre>
</div>
<p>Once you have the Keras model save as a single <code>.h5</code> file, you can freeze it to a TensorFlow graph for inferencing.</p>
<p>Take notes of the input and output nodes names printed in the output. We will need them when converting<span> </span><code>TensorRT</code><span> inference </span>graph and prediction.</p>
<p>For Keras MobileNetV2 model, they are, <code>['input_1'] ['Logits/Softmax']</code>.</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="kn">as</span> <span class="nn">tf</span>
<span class="kn">from</span> <span class="nn">tensorflow.python.framework</span> <span class="kn">import</span> <span class="n">graph_io</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">load_model</span>
<span class="c1"># Clear any previous session.</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">clear_session</span><span class="p">()</span>
<span class="n">save_pb_dir</span> <span class="o">=</span> <span class="s">'./model'</span>
<span class="n">model_fname</span> <span class="o">=</span> <span class="s">'./model/model.h5'</span>
<span class="k">def</span> <span class="nf">freeze_graph</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">session</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">save_pb_dir</span><span class="o">=</span><span class="s">'.'</span><span class="p">,</span> <span class="n">save_pb_name</span><span class="o">=</span><span class="s">'frozen_model.pb'</span><span class="p">,</span> <span class="n">save_pb_as_text</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
<span class="k">with</span> <span class="n">graph</span><span class="o">.</span><span class="n">as_default</span><span class="p">():</span>
<span class="n">graphdef_inf</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">graph_util</span><span class="o">.</span><span class="n">remove_training_nodes</span><span class="p">(</span><span class="n">graph</span><span class="o">.</span><span class="n">as_graph_def</span><span class="p">())</span>
<span class="n">graphdef_frozen</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">graph_util</span><span class="o">.</span><span class="n">convert_variables_to_constants</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">graphdef_inf</span><span class="p">,</span> <span class="n">output</span><span class="p">)</span>
<span class="n">graph_io</span><span class="o">.</span><span class="n">write_graph</span><span class="p">(</span><span class="n">graphdef_frozen</span><span class="p">,</span> <span class="n">save_pb_dir</span><span class="p">,</span> <span class="n">save_pb_name</span><span class="p">,</span> <span class="n">as_text</span><span class="o">=</span><span class="n">save_pb_as_text</span><span class="p">)</span>
<span class="k">return</span> <span class="n">graphdef_frozen</span>
<span class="c1"># This line must be executed before loading Keras model.</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">set_learning_phase</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">load_model</span><span class="p">(</span><span class="n">model_fname</span><span class="p">)</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">get_session</span><span class="p">()</span>
<span class="n">input_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">t</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">inputs</span><span class="p">]</span>
<span class="n">output_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">t</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">outputs</span><span class="p">]</span>
<span class="c1"># Prints input and output nodes names, take notes of them.</span>
<span class="k">print</span><span class="p">(</span><span class="n">input_names</span><span class="p">,</span> <span class="n">output_names</span><span class="p">)</span>
<span class="n">frozen_graph</span> <span class="o">=</span> <span class="n">freeze_graph</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">graph</span><span class="p">,</span> <span class="n">session</span><span class="p">,</span> <span class="p">[</span><span class="n">out</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">out</span> <span class="ow">in</span> <span class="n">model</span><span class="o">.</span><span class="n">outputs</span><span class="p">],</span> <span class="n">save_pb_dir</span><span class="o">=</span><span class="n">save_pb_dir</span><span class="p">)</span>
</pre>
</div>
<p>Normally this frozen graph is what you use for deploying. However, it is not optimized to run on Jetson Nano for both speed and resource efficiency wise. That is what TensorRT comes into play, it quantizes the model from FP32 to FP16, effectively reducing the memory consumption. It also fuses layers and tensor together which further optimizes the use of GPU memory and bandwidth. All this come with little or no noticeable reduced accuracy.</p>
<p>And this can be done in a single call,</p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tensorflow.contrib.tensorrt</span> <span class="kn">as</span> <span class="nn">trt</span>
<span class="n">trt_graph</span> <span class="o">=</span> <span class="n">trt</span><span class="o">.</span><span class="n">create_inference_graph</span><span class="p">(</span>
<span class="n">input_graph_def</span><span class="o">=</span><span class="n">frozen_graph</span><span class="p">,</span>
<span class="n">outputs</span><span class="o">=</span><span class="n">output_names</span><span class="p">,</span>
<span class="n">max_batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">max_workspace_size_bytes</span><span class="o">=</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">25</span><span class="p">,</span>
<span class="n">precision_mode</span><span class="o">=</span><span class="s">'FP16'</span><span class="p">,</span>
<span class="n">minimum_segment_size</span><span class="o">=</span><span class="mi">50</span>
<span class="p">)</span>
</pre>
</div>
<p>The result is also a TensorFlow graph but optimized to run on your Jetson Nano with TensorRT. Let's save it as a single <code>.pb</code> file.</p>
<div class="highlight">
<pre><span class="n">graph_io</span><span class="o">.</span><span class="n">write_graph</span><span class="p">(</span><span class="n">trt_graph</span><span class="p">,</span> <span class="s">"./model/"</span><span class="p">,</span>
<span class="s">"trt_graph.pb"</span><span class="p">,</span> <span class="n">as_text</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</pre>
</div>
<p>Download the TensorRT graph <code>.pb</code> file either from colab or your local machine into your Jetson Nano. You can use <span>scp/</span><span>sftp</span> to remotely copy the file. For Windows, you can use <a href="https://winscp.net/eng/index.php">WinSCP</a>, for Linux/Mac you can try scp/sftp from the command line.</p>
<h2><span>Step 2: Loads </span><span>TensorRT graph and make predictions</span></h2>
<p><span>On your Jetson Nano, start a Jupyter Notebook with command <code>jupyter notebook --ip=0.0.0.0</code> where you have saved the downloaded graph file to <code>./model/trt_graph.pb</code>. The following code will load the TensorRT graph and make it ready for inferencing.</span></p>
<p><span>The output and the input names might be different for your choice of Keras model other than the MobileNetV2.</span></p>
<div class="highlight">
<pre><span class="n">output_names</span> <span class="o">=</span> <span class="p">[</span><span class="s">'Logits/Softmax'</span><span class="p">]</span>
<span class="n">input_names</span> <span class="o">=</span> <span class="p">[</span><span class="s">'input_1'</span><span class="p">]</span>
<span class="kn">import</span> <span class="nn">tensorflow</span> <span class="kn">as</span> <span class="nn">tf</span>
<span class="k">def</span> <span class="nf">get_frozen_graph</span><span class="p">(</span><span class="n">graph_file</span><span class="p">):</span>
<span class="sd">"""Read Frozen Graph file from disk."""</span>
<span class="k">with</span> <span class="n">tf</span><span class="o">.</span><span class="n">gfile</span><span class="o">.</span><span class="n">FastGFile</span><span class="p">(</span><span class="n">graph_file</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">graph_def</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">GraphDef</span><span class="p">()</span>
<span class="n">graph_def</span><span class="o">.</span><span class="n">ParseFromString</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">return</span> <span class="n">graph_def</span>
<span class="n">trt_graph</span> <span class="o">=</span> <span class="n">get_frozen_graph</span><span class="p">(</span><span class="s">'./model/trt_graph.pb'</span><span class="p">)</span>
<span class="c1"># Create session and load graph</span>
<span class="n">tf_config</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">ConfigProto</span><span class="p">()</span>
<span class="n">tf_config</span><span class="o">.</span><span class="n">gpu_options</span><span class="o">.</span><span class="n">allow_growth</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">tf_sess</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">Session</span><span class="p">(</span><span class="n">config</span><span class="o">=</span><span class="n">tf_config</span><span class="p">)</span>
<span class="n">tf</span><span class="o">.</span><span class="n">import_graph_def</span><span class="p">(</span><span class="n">trt_graph</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">''</span><span class="p">)</span>
<span class="c1"># Get graph input size</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">trt_graph</span><span class="o">.</span><span class="n">node</span><span class="p">:</span>
<span class="k">if</span> <span class="s">'input_'</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span><span class="p">:</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">attr</span><span class="p">[</span><span class="s">'shape'</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span>
<span class="n">image_size</span> <span class="o">=</span> <span class="p">[</span><span class="n">size</span><span class="o">.</span><span class="n">dim</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">size</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">)]</span>
<span class="k">break</span>
<span class="k">print</span><span class="p">(</span><span class="s">"image_size: {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">image_size</span><span class="p">))</span>
<span class="c1"># input and output tensor names.</span>
<span class="n">input_tensor_name</span> <span class="o">=</span> <span class="n">input_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="s">":0"</span>
<span class="n">output_tensor_name</span> <span class="o">=</span> <span class="n">output_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="s">":0"</span>
<span class="k">print</span><span class="p">(</span><span class="s">"input_tensor_name: {}</span><span class="se">\n</span><span class="s">output_tensor_name: {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">input_tensor_name</span><span class="p">,</span> <span class="n">output_tensor_name</span><span class="p">))</span>
<span class="n">output_tensor</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">graph</span><span class="o">.</span><span class="n">get_tensor_by_name</span><span class="p">(</span><span class="n">output_tensor_name</span><span class="p">)</span>
</pre>
</div>
<p>Now, we can make a prediction with an elephant picture and see if the model gets it correctly.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.preprocessing</span> <span class="kn">import</span> <span class="n">image</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.applications.mobilenet_v2</span> <span class="kn">import</span> <span class="n">preprocess_input</span><span class="p">,</span> <span class="n">decode_predictions</span>
<span class="c1"># Optional image to test model prediction.</span>
<span class="n">img_path</span> <span class="o">=</span> <span class="s">'./data/elephant.jpg'</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">image</span><span class="o">.</span><span class="n">load_img</span><span class="p">(</span><span class="n">img_path</span><span class="p">,</span> <span class="n">target_size</span><span class="o">=</span><span class="n">image_size</span><span class="p">[:</span><span class="mi">2</span><span class="p">])</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">image</span><span class="o">.</span><span class="n">img_to_array</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">preprocess_input</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">feed_dict</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">input_tensor_name</span><span class="p">:</span> <span class="n">x</span>
<span class="p">}</span>
<span class="n">preds</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">,</span> <span class="n">feed_dict</span><span class="p">)</span>
<span class="c1"># decode the results into a list of tuples (class, description, probability)</span>
<span class="c1"># (one such list for each sample in the batch)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predicted:'</span><span class="p">,</span> <span class="n">decode_predictions</span><span class="p">(</span><span class="n">preds</span><span class="p">,</span> <span class="n">top</span><span class="o">=</span><span class="mi">3</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span>
</pre>
</div>
<h2><span>Benchmark results</span></h2>
<p><span>Let's run the inferencing several times and see how fast it can go.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">time</span>
<span class="n">times</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
<span class="n">start_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">one_prediction</span> <span class="o">=</span> <span class="n">tf_sess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">output_tensor</span><span class="p">,</span> <span class="n">feed_dict</span><span class="p">)</span>
<span class="n">delta</span> <span class="o">=</span> <span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start_time</span><span class="p">)</span>
<span class="n">times</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">delta</span><span class="p">)</span>
<span class="n">mean_delta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">times</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">fps</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">mean_delta</span>
<span class="k">print</span><span class="p">(</span><span class="s">'average(sec):{:.2f},fps:{:.2f}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">mean_delta</span><span class="p">,</span> <span class="n">fps</span><span class="p">))</span>
</pre>
</div>
<p><span></span>It got a 27.18 FPS which can be considered prediction in real time. In addition, the Keras model can inference at 60 FPS on Colab's Tesla K80 GPU, which is twice as fast as Jetson Nano, but that is a data center card.</p>
<h2>Conclusion and Further reading</h2>
<p>In this tutorial, we walked through how to convert, optimized your Keras image classification model with TensorRT and run inference on the Jetson Nano dev kit. Now, try another Keras ImageNet model or your custom model, connect a USB webcam/ Raspberry Pi camera to it and do a real-time prediction demo, be sure to share your results with us in the comments below.</p>
<p>In the future, we will look into running models for other applications, such as object detection. If you are interested in other affordable edge computing options, check out my <a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">previous post</a>, how to run <span>Keras model inference x3 times faster with CPU and Intel OpenVINO</span> also works for Movidius neural compute stick on Linux/Windows and Raspberry Pi.</p>
<h4><em>The source code for this tutorial is available on <a href="https://github.com/Tony607/tf_jetson_nano">my GitHub repo</a>. You can also skip the step 1 model conversion and download the <a href="https://github.com/Tony607/tf_jetson_nano/releases/download/V0.1/trt_graph.pb">trt_graph.pb</a> file directly from the GitHub repo releases.</em></h4>
<p></p>How to do Hyper-parameters search with Bayesian optimization for Keras model2019-04-06T05:23:50+00:002024-03-18T14:05:09+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-do-hyperparameter-search-with-baysian-optimization-for-keras-model/<p><img alt="search" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/4288a1640e7dea8f7566df4a5dcf39ba07f28331/images/bayesian/search.png"/></p>
<p>Compared to more simpler hyperparameter search methods like grid search and random search, Bayesian optimization is built upon Bayesian inference and Gaussian process with an attempts to find the maximum value of an unknown function as few iterations as possible. It is particularly suited for optimization of high-cost functions like hyperparameter search for deep learning model, or other situations where the balance between exploration and exploitation is important.</p>
<p>The Bayesian Optimization package we are going to use is <a href="https://github.com/fmfn/BayesianOptimization">BayesianOptimization</a>, which can be installed with the following command,</p>
<pre><code>pip install bayesian-optimization</code></pre>
<p>Firstly, we will specify the function to be optimized, in our case, hyperparameters search, the function takes a set of hyperparameters values as inputs, and output the evaluation accuracy for the Bayesian optimizer. Inside the function, a new model will be constructed with the specified hyperparameters, train for a number of epochs and evaluated against a set metrics. Every new evaluated accuracy will become a new observation for the Bayesian optimizer, which contributes to the next search hyperparameters' values. </p>
<p></p>
<p>Let's create a helper function first which builds the model with various parameters.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Conv2D</span><span class="p">,</span> <span class="n">Dropout</span><span class="p">,</span> <span class="n">BatchNormalization</span><span class="p">,</span> <span class="n">MaxPooling2D</span><span class="p">,</span> <span class="n">Flatten</span><span class="p">,</span> <span class="n">Activation</span>
<span class="kn">from</span> <span class="nn">tensorflow.python.keras.optimizer_v2</span> <span class="kn">import</span> <span class="n">rmsprop</span>
<span class="k">def</span> <span class="nf">get_model</span><span class="p">(</span><span class="n">input_shape</span><span class="p">,</span> <span class="n">dropout2_rate</span><span class="o">=</span><span class="mf">0.5</span><span class="p">):</span>
<span class="sd">"""Builds a Sequential CNN model to recognize MNIST.</span>
<span class="sd"> Args:</span>
<span class="sd"> input_shape: Shape of the input depending on the `image_data_format`.</span>
<span class="sd"> dropout2_rate: float between 0 and 1. Fraction of the input units to drop for `dropout_2` layer.</span>
<span class="sd"> Returns:</span>
<span class="sd"> a Keras model</span>
<span class="sd"> """</span>
<span class="c1"># Reset the tensorflow backend session.</span>
<span class="c1"># tf.keras.backend.clear_session()</span>
<span class="c1"># Define a CNN model to recognize MNIST.</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span>
<span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span>
<span class="n">input_shape</span><span class="o">=</span><span class="n">input_shape</span><span class="p">,</span>
<span class="n">name</span><span class="o">=</span><span class="s">"conv2d_1"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"conv2d_2"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">"maxpool2d_1"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"dropout_1"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">"flatten"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"dense_1"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout2_rate</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"dropout_2"</span><span class="p">))</span>
<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">NUM_CLASSES</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"dense_2"</span><span class="p">))</span>
<span class="k">return</span> <span class="n">model</span>
</pre>
</div>
<p>Then, here is the function to be optimized with Bayesian optimizer, the <strong>partial</strong> function takes care of two arguments - <code>input_shape</code> and <code>verbose</code> in <code>fit_with</code> which have fixed values during the runtime.</p>
<p>The function takes two hyperparameters to search, the dropout rate for the "dropout_2" layer and learning rate value, it trains the model for 1 epoch and outputs the evaluation accuracy for the Bayesian optimizer.</p>
<div class="highlight">
<pre><span class="k">def</span> <span class="nf">fit_with</span><span class="p">(</span><span class="n">input_shape</span><span class="p">,</span> <span class="n">verbose</span><span class="p">,</span> <span class="n">dropout2_rate</span><span class="p">,</span> <span class="n">lr</span><span class="p">):</span>
<span class="c1"># Create the model using a specified hyperparameters.</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">get_model</span><span class="p">(</span><span class="n">input_shape</span><span class="p">,</span> <span class="n">dropout2_rate</span><span class="p">)</span>
<span class="c1"># Train the model for a specified number of epochs.</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">rmsprop</span><span class="o">.</span><span class="n">RMSProp</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">lr</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">losses</span><span class="o">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">optimizer</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
<span class="c1"># Train the model with the train dataset.</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">train_ds</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">steps_per_epoch</span><span class="o">=</span><span class="mi">468</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">)</span>
<span class="c1"># Evaluate the model with the eval dataset.</span>
<span class="n">score</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">eval_ds</span><span class="p">,</span> <span class="n">steps</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test loss:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test accuracy:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="c1"># Return the accuracy.</span>
<span class="k">return</span> <span class="n">score</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>
<span class="n">verbose</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">fit_with_partial</span> <span class="o">=</span> <span class="n">partial</span><span class="p">(</span><span class="n">fit_with</span><span class="p">,</span> <span class="n">input_shape</span><span class="p">,</span> <span class="n">verbose</span><span class="p">)</span>
</pre>
</div>
<p><strong></strong>The <strong>BayesianOptimization</strong> object will work out of the box without much tuning needed. The constructor takes the function to be optimized as well as the boundaries of hyperparameters to search. The main method you should be aware of is <code>maximize</code>, which does exactly what you think it does, maximizing the evaluation accuracy given the hyperparameters.</p>
<div class="highlight">
<pre><span class="kn">from</span> <span class="nn">bayes_opt</span> <span class="kn">import</span> <span class="n">BayesianOptimization</span>
<span class="c1"># Bounded region of parameter space</span>
<span class="n">pbounds</span> <span class="o">=</span> <span class="p">{</span><span class="s">'dropout2_rate'</span><span class="p">:</span> <span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">),</span> <span class="s">'lr'</span><span class="p">:</span> <span class="p">(</span><span class="mf">1e-4</span><span class="p">,</span> <span class="mf">1e-2</span><span class="p">)}</span>
<span class="n">optimizer</span> <span class="o">=</span> <span class="n">BayesianOptimization</span><span class="p">(</span>
<span class="n">f</span><span class="o">=</span><span class="n">fit_with_partial</span><span class="p">,</span>
<span class="n">pbounds</span><span class="o">=</span><span class="n">pbounds</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="c1"># verbose = 1 prints only when a maximum is observed, verbose = 0 is silent</span>
<span class="n">random_state</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">optimizer</span><span class="o">.</span><span class="n">maximize</span><span class="p">(</span><span class="n">init_points</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">n_iter</span><span class="o">=</span><span class="mi">10</span><span class="p">,)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">res</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">optimizer</span><span class="o">.</span><span class="n">res</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Iteration {}: </span><span class="se">\n\t</span><span class="s">{}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">res</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="n">optimizer</span><span class="o">.</span><span class="n">max</span><span class="p">)</span>
</pre>
</div>
<p>Here are many parameters you can pass to <code>maximize</code>, nonetheless, the most important ones are:</p>
<ul>
<li><code>n_iter</code>: How many steps of Bayesian optimization you want to perform. The more steps the more likely to find a good maximum you are.</li>
<li><code>init_points</code>: How many steps of<span> </span><strong>random</strong><span> </span>exploration you want to perform. Random exploration can help by diversifying the exploration space.</li>
</ul>
<pre>| iter | target | dropou... | lr |
-------------------------------------------------
468/468 [==============================] - 4s 8ms/step - loss: 0.2575 - acc: 0.9246
Test loss: 0.061651699058711526
Test accuracy: 0.9828125
| 1 | 0.9828 | 0.2668 | 0.007231 |
468/468 [==============================] - 4s 8ms/step - loss: 0.2065 - acc: 0.9363
Test loss: 0.04886047407053411
Test accuracy: 0.9828125
| 2 | 0.9828 | 0.1 | 0.003093 |
468/468 [==============================] - 4s 8ms/step - loss: 0.2199 - acc: 0.9336
Test loss: 0.05553104653954506
Test accuracy: 0.98125
| 3 | 0.9812 | 0.1587 | 0.001014 |
468/468 [==============================] - 4s 9ms/step - loss: 0.2075 - acc: 0.9390
Test loss: 0.04128134781494737
<strong>Test accuracy: 0.9890625</strong>
<strong>| <span class="ansi-magenta-intense-fg"> 4 </span> | <span class="ansi-magenta-intense-fg"> 0.9891 </span> | <span class="ansi-magenta-intense-fg"> 0.1745 </span> | <span class="ansi-magenta-intense-fg"> 0.003521</span> |</strong></pre>
<p>After searching for 4 times, the model build with the found hyperparameters achieves an evaluation accuracy of 98.9% with just one epoch of training.</p>
<p></p>
<h2>Comparing to other search methods</h2>
<p>Unlike grid search which does search in a finite number of discrete <span>hyperparameters combinations, the nature of Bayesian optimization with Gaussian processes doesn't allow for an easy/intuitive way of dealing with discrete parameters. </span></p>
<p><span>For example, we want to search for the number of the neuron of a dense layer from a list of options. To apply Bayesian optimization, it is necessary to explicitly convert the input parameters to discrete ones before constructing the model.</span></p>
<p>You can do something like this.</p>
<div class="highlight">
<pre><span class="n">pbounds</span> <span class="o">=</span> <span class="p">{</span><span class="s">'dropout2_rate'</span><span class="p">:</span> <span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">),</span> <span class="s">'lr'</span><span class="p">:</span> <span class="p">(</span><span class="mf">1e-4</span><span class="p">,</span> <span class="mf">1e-2</span><span class="p">),</span> <span class="s">"dense_1_neurons_x128"</span><span class="p">:</span> <span class="p">(</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">3.1</span><span class="p">)}</span>
<span class="k">def</span> <span class="nf">fit_with</span><span class="p">(</span><span class="n">input_shape</span><span class="p">,</span> <span class="n">verbose</span><span class="p">,</span> <span class="n">dropout2_rate</span><span class="p">,</span> <span class="n">dense_1_neurons_x128</span><span class="p">,</span> <span class="n">lr</span><span class="p">):</span>
<span class="c1"># Create the model using a specified hyperparameters.</span>
<span class="n">dense_1_neurons</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">dense_1_neurons_x128</span> <span class="o">*</span> <span class="mi">128</span><span class="p">),</span> <span class="mi">128</span><span class="p">)</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">get_model</span><span class="p">(</span><span class="n">input_shape</span><span class="p">,</span> <span class="n">dropout2_rate</span><span class="p">,</span> <span class="n">dense_1_neurons</span><span class="p">)</span>
<span class="c1"># ...</span>
</pre>
</div>
<p>The dense layers neurons will be mapped to 3 unique discrete values, 128, 256 and 384 before constructing to the model.</p>
<p><span>In Bayesian optimization,</span> every next search values depend on previous observations(previous evaluation accuracies), the whole <span>optimization</span> process can be hard to be distributed or parallelized like the grid or random search methods.</p>
<h2><span>Conclusion and further reading</span></h2>
<p><span></span>This quick tutorial introduces how to do hyperparameter search with Bayesian optimization, it can be more efficient compared to other methods like the grid or random since every search are "<strong>guided</strong>" from previous search results.</p>
<h3>Some material you might find helpful</h3>
<p><a href="https://github.com/fmfn/BayesianOptimization">BayesianOptimization </a>- The Python implementation of global optimization with Gaussian processes used in this tutorial.</p>
<p><a href="https://www.dlology.com/blog/how-to-perform-keras-hyperparameter-optimization-on-tpu-for-free/">How to perform Keras hyperparameter optimization x3 faster on TPU for free</a> - My previous tutorial on performing grid <span>hyperparameter</span> search with Colab's free TPU.</p>
<h4><em>Check out the full source code on my <a href="https://github.com/Tony607/Keras_BayesianOptimization">GitHub</a>.</em></h4>How to run TensorBoard in Jupyter Notebook2019-03-17T03:01:18+00:002024-03-19T11:43:33+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-tensorboard-in-jupyter-notebook/<p><img alt="jtb" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/b23e1a0f87d4653504646628dfc85759805fa1ef/images/tf2/jtb.png"/></p>
<p>TensorBoard is a great tool providing visualization of many metrics necessary to evaluate TensorFlow model training. It used to be difficult to bring up this tool especially in a hosted Jupyter Notebook environment such as Google Colab, Kaggle notebook and Coursera's Notebook etc. In this tutorial, I will show you how seamless it is to run and view TensorBoard right inside a hosted or local Jupyter notebook with the latest TensorFlow 2.0.</p>
<p>You can run this <a href="https://colab.research.google.com/gist/Tony607/7f55518ba7af13eb7e2e782b3b50a38b/tensorboard_in_notebooks.ipynb">Colab Notebook</a> while reading this post.</p>
<p><span>Start by installing TF 2.0 and loading the TensorBoard notebook extension:</span></p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="o">-</span><span class="n">q</span> <span class="n">tf</span><span class="o">-</span><span class="n">nightly</span><span class="o">-</span><span class="mf">2.0</span><span class="o">-</span><span class="n">preview</span>
<span class="c1"># Load the TensorBoard notebook extension</span>
<span class="o">%</span><span class="n">load_ext</span> <span class="n">tensorboard</span>
</pre>
</div>
<p><span>Alternatively, to run a local notebook, you can create a conda virtual environment and install TensorFlow 2.0.</span></p>
<div class="highlight">
<pre><span class="n">conda</span> <span class="n">create</span> <span class="o">-</span><span class="n">n</span> <span class="n">tf2</span> <span class="n">python</span><span class="o">=</span><span class="mf">3.6</span>
<span class="n">activate</span> <span class="n">tf2</span>
<span class="n">pip</span> <span class="n">install</span> <span class="n">tf</span><span class="o">-</span><span class="n">nightly</span><span class="o">-</span><span class="n">gpu</span><span class="o">-</span><span class="mf">2.0</span><span class="o">-</span><span class="n">preview</span>
<span class="n">conda</span> <span class="n">install</span> <span class="n">jupyter</span>
</pre>
</div>
<p><span>Then you can start TensorBoard before training to monitor it in progress: within the notebook using <a href="https://ipython.readthedocs.io/en/stable/interactive/magics.html" rel="nofollow"><g class="gr_ gr_35 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del" data-gr-id="35" id="35">magics</g></a><span>.</span></span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="kn">as</span> <span class="nn">tf</span>
<span class="kn">import</span> <span class="nn">datetime</span><span class="o">,</span> <span class="nn">os</span>
<span class="n">logs_base_dir</span> <span class="o">=</span> <span class="s">"./logs"</span>
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">logs_base_dir</span><span class="p">,</span> <span class="n">exist_ok</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="o">%</span><span class="n">tensorboard</span> <span class="o">--</span><span class="n">logdir</span> <span class="p">{</span><span class="n">logs_base_dir</span><span class="p">}</span>
</pre>
</div>
<p><span>Right now you can see an empty TensorBoard view with the message "No dashboards are active for the current data set", this is because the log directory is currently empty.</span></p>
<p><span>Lets' create, train and log some data with a very simple Keras model.</span></p>
<div class="highlight">
<pre><span class="k">def</span> <span class="nf">create_model</span><span class="p">():</span>
<span class="k">return</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">([</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">(</span><span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">28</span><span class="p">,</span> <span class="mi">28</span><span class="p">)),</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">),</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.2</span><span class="p">),</span>
<span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">)</span>
<span class="p">])</span>
<span class="k">def</span> <span class="nf">train_model</span><span class="p">():</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">create_model</span><span class="p">()</span>
<span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'sparse_categorical_crossentropy'</span><span class="p">,</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
<span class="n">logdir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">logs_base_dir</span><span class="p">,</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y%m</span><span class="si">%d</span><span class="s">-%H%M%S"</span><span class="p">))</span>
<span class="n">tensorboard_callback</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">callbacks</span><span class="o">.</span><span class="n">TensorBoard</span><span class="p">(</span><span class="n">logdir</span><span class="p">,</span> <span class="n">histogram_freq</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_train</span><span class="p">,</span>
<span class="n">y</span><span class="o">=</span><span class="n">y_train</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">),</span>
<span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">tensorboard_callback</span><span class="p">])</span>
<span class="n">train_model</span><span class="p">()</span>
</pre>
</div>
<p>Now go back to previous TensorBoard output, <span>refresh it with the button on the top right and watch the update view.</span></p>
<p><img alt="tb1" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/b23e1a0f87d4653504646628dfc85759805fa1ef/images/tf2/tb1.png"/></p>
<p><span>The same TensorBoard backend is reused by issuing the same command. If a different logs directory was chosen, a new instance of TensorBoard would be opened. Ports are managed automatically.</span></p>
<p><span>Any new interesting feature worth mentioning is the "<strong>conceptual graph</strong>". To see the conceptual graph, select the “keras” tag. For this example, you’ll see a collapsed <strong>Sequential</strong><span><span> </span>node. Double-click the node to see the model’s structure:</span></span></p>
<p><img alt="tag_k" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/b23e1a0f87d4653504646628dfc85759805fa1ef/images/tf2/tag_k.png"/></p>
<h2><span><span>Conclusion and further reading</span></span></h2>
<p><span><span>In this quick tutorial, we walked through how to fire up and view a full bloom TensorBoard right inside Jupyter Notebook. For further instructions on how to leverage other new features of TensorBoard in TensorFlow 2.0, be sure to check out those resources.</span></span></p>
<p><a href="https://www.tensorflow.org/tensorboard/r2/scalars_and_keras">TensorBoard Scalars: Logging training metrics in Keras</a></p>
<p><a href="https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams">Hyperparameter Tuning with the HParams Dashboard</a></p>
<p><a href="https://www.tensorflow.org/tensorboard/r2/what_if_tool">Model Understanding with the What-If Tool Dashboard</a></p>
<p><span></span></p>
<p></p>How to run TensorFlow object detection model faster with Intel Graphics2019-02-16T08:38:36+00:002024-03-19T04:08:41+00:00Chengweihttps://www.dlology.com/blog/author/Chengwei/https://www.dlology.com/blog/how-to-run-tensorflow-object-detection-model-faster-with-intel-graphics/<p><img alt="chip" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8f4e0685b86b6367e3d68c4f2ee67e583712d155/images/object-detection/chip.png"/></p>
<p>In this tutorial, I will show you how run inference of <a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">your custom trained TensorFlow object detection model</a> on Intel graphics at least x2 faster with OpenVINO toolkit compared to TensorFlow CPU backend. My benchmark also shows the solution is only 22% slower compared to TensorFlow GPU backend with GTX1070 card.</p>
<p>If you are new to OpenVINO toolkit, it is suggested to take a look at <a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">the previous tutorial</a> on how to convert a Keras image classification model and accelerate inference speed with OpenVINO. This time, we will take a step further with object detection model.</p>
<h2>Prerequisites</h2>
<p>To convert a TensorFlow frozen object detection graph to OpenVINO <span>Intermediate Representation(IR) files, you will have those two files ready,</span></p>
<div>
<ul>
<li><span>Frozen TensorFlow object detection model. i.e. </span><span>`<strong>frozen_inference_graph.pb</strong>`</span><span> downloaded from <a href="https://colab.research.google.com/github/Tony607/object_detection_demo/blob/master/tensorflow_object_detection_training_colab.ipynb">Colab</a> after training.</span></li>
<li><span>The modified pipeline config file used for training. Also downloaded from Colab after training, in our case, it is the `<strong>ssd_mobilenet_v2_coco.config</strong>` file.</span></li>
</ul>
<p><span>You can also download my <a href="https://github.com/Tony607/REPO/releases/download/V0.1/checkpoint.zip">copy</a> of those files from the GitHub release.</span></p>
<h3>OpenVINO model optimization</h3>
<p>Similar to the <a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">previous image classification model</a>, you will specify the data type to quantize the model weights.<br/>The data type can be "FP16" or "FP32" depends on what device you want to run the converted model.</p>
<ul>
<li>FP16: GPU and MYRIAD (Movidius neural compute stick)</li>
<li>FP32: CPU and GPU</li>
</ul>
<p>Generally speaking, FP16 quantized model cuts down the size of the weights by half, run much faster but may come with minor degraded accuracy.</p>
<p>Another important file is the OpenVINO subgraph replacement configuration file that describes rules to convert specific TensorFlow topologies. For the models downloaded from the TensorFlow <span>Object Detection API zoo, you can find the configuration files in <g class="gr_ gr_107 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" data-gr-id="107" id="107">the</g><span><g class="gr_ gr_107 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="107" id="107"> </g><code><INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf</code><g class="gr_ gr_107 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="107" id="107"> directory</g>.</span></span></p>
<p>Use:</p>
<ul>
<li><code>ssd_v2_support.json</code><span> </span>- for frozen SSD topologies from the <g class="gr_ gr_115 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="115" id="115">models</g> zoo.</li>
<li><code>faster_rcnn_support.json</code><span> </span>- for frozen Faster R-CNN topologies from the <g class="gr_ gr_116 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="116" id="116">models</g> zoo.</li>
<li><code>faster_rcnn_support_api_v1.7.json</code><span> </span>- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 or higher.</li>
<li>...</li>
</ul>
<p>We will <g class="gr_ gr_102 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" data-gr-id="102" id="102">pick </g><code>ssd_v2_support.json</code><g class="gr_ gr_102 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="102" id="102"> for</g> this tutorial since it is an SSD model.</p>
<p>With all the setting ready, we can start the model optimization script.</p>
<div class="highlight">
<pre><span class="err">!</span><span class="n">python</span> <span class="p">{</span><span class="n">mo_tf_path</span><span class="p">}</span> \
<span class="o">--</span><span class="n">input_model</span> <span class="p">{</span><span class="n">pb_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">output_dir</span> <span class="p">{</span><span class="n">output_dir</span><span class="p">}</span> \
<span class="o">--</span><span class="n">tensorflow_use_custom_operations_config</span> <span class="p">{</span><span class="n">configuration_file</span><span class="p">}</span> \
<span class="o">--</span><span class="n">tensorflow_object_detection_api_pipeline_config</span> <span class="p">{</span><span class="n">pipeline</span><span class="p">}</span> \
<span class="o">--</span><span class="n">input_shape</span> <span class="p">{</span><span class="n">input_shape_str</span><span class="p">}</span> \
<span class="o">--</span><span class="n">data_type</span> <span class="p">{</span><span class="n">data_type</span><span class="p">}</span> \
</pre>
</div>
<p>You can find the .xml and .bin files located in the <g class="gr_ gr_106 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" data-gr-id="106" id="106">specified </g><code>{output_dir}</code><g class="gr_ gr_106 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="106" id="106"> after</g> the conversion.</p>
<h2>Make predictions</h2>
<p>Loading the model with OpenVINO toolkit is similar to the previous image classification example. While how we preprocess inputs and interpret the outputs are different.</p>
<p>For the image preprocessing, it is a good practice to resize the image width and height to match with what is defined in the `<strong>ssd_mobilenet_v2_coco.config</strong>` file, which is 300 x 300. Besides, there is no need to normalize the pixel value to 0~1, just keep them as UNIT8 ranging between 0 to 255.</p>
<p>Here is the preprocessing function.</p>
<div class="highlight">
<pre><span class="k">def</span> <span class="nf">pre_process_image</span><span class="p">(</span><span class="n">imagePath</span><span class="p">,</span> <span class="n">img_shape</span><span class="p">):</span>
<span class="sd">"""pre process an image from image path.</span>
<span class="sd"> </span>
<span class="sd"> Arguments:</span>
<span class="sd"> imagePath {str} -- input image file path.</span>
<span class="sd"> img_shape {tuple} -- Target height and width as a tuple.</span>
<span class="sd"> </span>
<span class="sd"> Returns:</span>
<span class="sd"> np.array -- Preprocessed image.</span>
<span class="sd"> """</span>
<span class="c1"># Model input format</span>
<span class="k">assert</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">img_shape</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">img_shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span>
<span class="n">n</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">img_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">img_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">Image</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">imagePath</span><span class="p">)</span>
<span class="n">processed_img</span> <span class="o">=</span> <span class="n">image</span><span class="o">.</span><span class="n">resize</span><span class="p">((</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">),</span> <span class="n">resample</span><span class="o">=</span><span class="n">Image</span><span class="o">.</span><span class="n">BILINEAR</span><span class="p">)</span>
<span class="n">processed_img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">processed_img</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
<span class="c1"># Change data layout from HWC to CHW</span>
<span class="n">processed_img</span> <span class="o">=</span> <span class="n">processed_img</span><span class="o">.</span><span class="n">transpose</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">processed_img</span> <span class="o">=</span> <span class="n">processed_img</span><span class="o">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">n</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">))</span>
<span class="k">return</span> <span class="n">processed_img</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
</pre>
</div>
<p>Now you can feed the preprocessed data to the network and get its prediction outputs as a dictionary which contains a key, "<strong>DetectionOutput</strong>".</p>
<div class="highlight">
<pre><span class="c1"># Run inference</span>
<span class="n">img_shape</span> <span class="o">=</span> <span class="p">(</span><span class="n">img_height</span><span class="p">,</span> <span class="n">img_height</span><span class="p">)</span>
<span class="n">processed_img</span><span class="p">,</span> <span class="n">image</span> <span class="o">=</span> <span class="n">pre_process_image</span><span class="p">(</span><span class="n">fname</span><span class="p">,</span> <span class="n">img_shape</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">exec_net</span><span class="o">.</span><span class="n">infer</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">{</span><span class="n">input_blob</span><span class="p">:</span> <span class="n">processed_img</span><span class="p">})</span>
<span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="s">'DetectionOutput'</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="c1"># Expect: (1, 1, 100, 7)</span>
</pre>
</div>
<p>The Inference Engine "<strong>DetectionOutput</strong>" layer produces one tensor with seven numbers for each actual detection, each of the 7 numbers stands for,</p>
<ul>
<li>0: batch index</li>
<li>1: class label, defined in the label <g class="gr_ gr_122 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" data-gr-id="122" id="122">map </g><code>.pbtxt</code><g class="gr_ gr_122 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" data-gr-id="122" id="122"> file</g>.</li>
<li>2: class probability</li>
<li>3: x_1 box coordinate (0~1 as a fraction of the image width reference to the upper left corner)</li>
<li>4: y_1 box coordinate (0~1 as a fraction of the image height reference to the upper left corner)</li>
<li>5: x_2 box coordinate (0~1 as a fraction of the image width reference to the upper left corner)</li>
<li>6: y_2 box coordinate (0~1 as a fraction of the image height reference to the upper left corner)</li>
</ul>
<p>After known this, we can easily f<span>ilter the results with a prediction probability threshold and visualize them as bounding boxes drawing around the detected objects.</span></p>
<div class="highlight">
<pre><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">matplotlib.patches</span> <span class="kn">as</span> <span class="nn">patches</span>
<span class="n">probability_threshold</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">preds</span> <span class="o">=</span> <span class="p">[</span><span class="n">pred</span> <span class="k">for</span> <span class="n">pred</span> <span class="ow">in</span> <span class="n">res</span><span class="p">[</span><span class="s">'DetectionOutput'</span><span class="p">][</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">pred</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">></span> <span class="n">probability_threshold</span><span class="p">]</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">image</span><span class="p">)</span> <span class="c1"># slice by z axis of the box - box[0].</span>
<span class="k">for</span> <span class="n">pred</span> <span class="ow">in</span> <span class="n">preds</span><span class="p">:</span>
<span class="n">class_label</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">probability</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predict class label:{}, with probability: {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">class_label</span><span class="p">,</span> <span class="n">probability</span><span class="p">))</span>
<span class="n">box</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">3</span><span class="p">:]</span>
<span class="n">box</span> <span class="o">=</span> <span class="p">(</span><span class="n">box</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">image</span><span class="o">.</span><span class="n">shape</span><span class="p">[:</span><span class="mi">2</span><span class="p">][::</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">2</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">int</span><span class="p">)</span>
<span class="n">x_1</span><span class="p">,</span> <span class="n">y_1</span><span class="p">,</span> <span class="n">x_2</span><span class="p">,</span> <span class="n">y_2</span> <span class="o">=</span> <span class="n">box</span>
<span class="n">rect</span> <span class="o">=</span> <span class="n">patches</span><span class="o">.</span><span class="n">Rectangle</span><span class="p">((</span><span class="n">x_1</span><span class="p">,</span> <span class="n">y_1</span><span class="p">),</span> <span class="n">x_2</span><span class="o">-</span><span class="n">x_1</span><span class="p">,</span> <span class="n">y_2</span> <span class="o">-</span>
<span class="n">y_1</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s">'red'</span><span class="p">,</span> <span class="n">facecolor</span><span class="o">=</span><span class="s">'none'</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">add_patch</span><span class="p">(</span><span class="n">rect</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">text</span><span class="p">(</span><span class="n">x_1</span><span class="p">,</span> <span class="n">y_1</span><span class="p">,</span> <span class="s">'{:.0f} - {:.2f}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">class_label</span><span class="p">,</span>
<span class="n">probability</span><span class="p">),</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'yellow'</span><span class="p">)</span>
</pre>
</div>
<p>Here is an example to show the results of object detection.</p>
<p><img alt="vino_prediction" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/cf6f97dfce528de9c300a137a00a742dcaed6742/images/object-detection/vino_prediction.png"/></p>
<h2>Benchmark the inference speed</h2>
<p>Let's try the <strong>ssd_mobilenet_v2</strong> object detection model on various hardware and configs, and here is what you get.</p>
<p>The benchmark setup,</p>
<ul>
<li>Inference 20 times and do the average.</li>
<li>Input image shape: (300,300,3)</li>
</ul>
<p></p>
<p><img alt="benchmark" src="https://gitcdn.xyz/cdn/Tony607/blog_statics/8f4e0685b86b6367e3d68c4f2ee67e583712d155/images/object-detection/benchmark.png"/></p>
<p>As you can see the OpenVINO model running on the Intel GPU with quantized weights achieves 50 FPS(Frames/Seconds) while TensorFlow CPU backend only gets around 18.9 FPS.</p>
<p>Running the model with neural compute stick 2 either on Windows or Raspberry Pi also shows promising results.</p>
<p>You can run the <a href="https://github.com/Tony607/object_detection_demo/blob/master/deploy/openvino_inference_benchmark.py">openvino_inference_benchmark.py</a> and <a href="https://github.com/Tony607/object_detection_demo/blob/master/local_inference_test.py">local_inference_test.py</a> scripts if you want to reproduce the benchmark yourself.</p>
<h2>Conclusion and further reading</h2>
<p>This post walks you through how to convert a custom trained TensorFlow object detection model to OpenVINO format and inference on various hardware and configurations. Their benchmark results can help you to decide what is the best fit for your edge inferencing scenario.</p>
<h4>Related materials you might find helpful</h4>
<p><span><a href="https://www.dlology.com/blog/how-to-run-keras-model-inference-x3-times-faster-with-cpu-and-intel-openvino-1/">How to run Keras model inference x3 times faster with CPU and Intel OpenVINO</a> - blog</span></p>
<p><span><a href="https://www.dlology.com/blog/how-to-train-an-object-detection-model-easy-for-free/">How to train an object detection model easy for free</a> - blog</span></p>
<p>The<a href="https://github.com/Tony607/object_detection_demo"> GitHub repository</a> for this post.</p>
<p></p>
<p></p>
</div>