About me Blog

Object detection using a Raspberry Pi with Yolo and SSD Mobilenet

3 mins read time


Deep learning algorithms are very useful for computer vision in applications such as image classification, object detection, or instance segmentation. The main drawback is that these algorithms need in most cases graphical processing units to be trained and sometimes making predictions can require to load a heavy model. This is one of the main constrains when putting a deep learning model in production.

In this post, I will explain how to use state of the art deep learning algorithms for object detection that can run in light environments such as a Raspberrypi.

Object detection using Yolo V3

OpenCV DNN module

Most recent deep learning models are trained either in Tensorflow or Pytorch. Training a model requires to determine a high number of parameters, but not of them are used when doing inference (predictions). Tensorflow’s way to export a model is basically to identify the parameters that are needed for inference (graph, weights, etc) and export them in a Google optimized format called Protobuf

This protobuf model is very light compared to the original Tensorflow model and can be used for simple inference tasks. Google team released a model zoo repository with trained and optimized models that can be use for object detection applications.

Additionally, computer vision libraries like OpenCV can handle protobuf files to make predictions and remove tensorflow dependency when deploying a model. The OpenCV module that make this possible is called DNN module, which implements forward pass for deep networks.

Mobilenet SSD

One of the more used models for computer vision in light environments is Mobilenet. This convolutional model has a trade-off between latency and accuracy. It can be found in the Tensorflow object detection zoo, where you can download the model and the configuration files. Now I will describe the main functions used for making predictions.

Load tensorflow model

First we have to load the model into memory. The function readNetFromTensorflow from the DNN OpenCV module loads the Tensorflow model and a frozen protobuf file to be used for out of the box inference.

model = cv2.dnn.readNetFromTensorflow(
        'models/ssd_mobilenet/frozen_inference_graph.pb',
        'models/ssd_mobilenet/ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

Blob image

Then to obtain (correct) predictions from the model you need to pre-process your data. OpenCV DNN modules includes the function blobFromImage which creates a 4-dimensional blob from the image. It can also resize, crop an image, subtract mean values, scale values by a given factor, swap blue and red channels and many mode. To know more about blobs there is this good reference.

model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True))
output = model.forward()

Filter detections

The output of the models corresponds to an array of size (1, 1, 100, 7). We are interested in the results of the layer [0,0], where the dimension with 100 values corresponds to the number of detected bounding boxes and 7 corresponds to the class id, the confidence score and the bounding box coordinates. We can then filter the bounding box by the confidence score.

final_detection = list()
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > THRESHOLD:
        final_detection.append(detection)

Real time detection on Raspberry pi

Loading Mobilenet in a modern laptop takes about 0.5 seconds and inference takes 0.19 seconds. While loading Mobilenet in Raspberry takes 2.97 seconds in average and inference time is about 2.31 seconds. Which is real-time gives the following output.

Your browser doesn't support HTML5 video.

Yolo V3

There are other light deep learning networks that performs well in object detection like YOLO detection system, which model can be found on the official page. YOLOv3 is described as “extremely fast and accurate”. Which is true, because loading a model the tiny version takes 0.091 seconds and inference takes 0.2 seconds. However, from my test, Mobilenet performs a little bit better, like you can see in the following pictures.

Object detection using Tiny YoloV3

Object detection using SSD

The results of my tests can be found in the following table:

Device Mobilenet   Yolo  
  Model loading Inference Model loading Inference
PC 0.5 0.19 0.091 0.2
Raspberry 2.97 2.31 0.6 3.0

Conclusion

Using deep learning models in small environments like a Raspberrypi is possible and getting close to real time measurements. The complete code can be found here for the Mobilenet model and here for yolo

You may also like:

Document detection
opencv python
This post shows how to implemented a simple algorithm to detect a document inside and scanned image using python and the image processing library opencv
Dec 3, 2017