Object detection using a Raspberry Pi with Yolo and SSD Mobilenet

📅 Mar 6, 2019 ⏳ 3 mins read time ❤️ Views

Deep learning algorithms are very useful for computer vision in applications such as image classification, object detection, or instance segmentation. The main drawback is that these algorithms need in most cases graphical processing units to be trained and sometimes making predictions can require to load a heavy model. This is one of the main constrains when putting a deep learning model in production.

In this post, I will explain how to use state of the art deep learning algorithms for object detection that can run in light environments such as a Raspberrypi.

Object detection using Yolo V3

OpenCV DNN module

Most recent deep learning models are trained either in Tensorflow or Pytorch. Training a model requires to determine a high number of parameters, but not of them are used when doing inference (predictions). Tensorflow’s way to export a model is basically to identify the parameters that are needed for inference (graph, weights, etc) and export them in a Google optimized format called Protobuf

This protobuf model is very light compared to the original Tensorflow model and can be used for simple inference tasks. Google team released a model zoo repository with trained and optimized models that can be use for object detection applications.

Additionally, computer vision libraries like OpenCV can handle protobuf files to make predictions and remove tensorflow dependency when deploying a model. The OpenCV module that make this possible is called DNN module, which implements forward pass for deep networks.

Mobilenet SSD

One of the more used models for computer vision in light environments is Mobilenet. This convolutional model has a trade-off between latency and accuracy. It can be found in the Tensorflow object detection zoo, where you can download the model and the configuration files. Now I will describe the main functions used for making predictions.

Load tensorflow model

First we have to load the model into memory. The function readNetFromTensorflow from the DNN OpenCV module loads the Tensorflow model and a frozen protobuf file to be used for out of the box inference.

model = cv2.dnn.readNetFromTensorflow(
        'models/ssd_mobilenet/frozen_inference_graph.pb',
        'models/ssd_mobilenet/ssd_mobilenet_v2_coco_2018_03_29.pbtxt')

Blob image

Then to obtain (correct) predictions from the model you need to pre-process your data. OpenCV DNN modules includes the function blobFromImage which creates a 4-dimensional blob from the image. It can also resize, crop an image, subtract mean values, scale values by a given factor, swap blue and red channels and many mode. To know more about blobs there is this good reference.

model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True))
output = model.forward()

Filter detections

The output of the models corresponds to an array of size (1, 1, 100, 7). We are interested in the results of the layer [0,0], where the dimension with 100 values corresponds to the number of detected bounding boxes and 7 corresponds to the class id, the confidence score and the bounding box coordinates. We can then filter the bounding box by the confidence score.

final_detection = list()
for detection in output[0, 0, :, :]:
    confidence = detection[2]
    if confidence > THRESHOLD:
        final_detection.append(detection)

Real time detection on Raspberry pi

Loading Mobilenet in a modern laptop takes about 0.5 seconds and inference takes 0.19 seconds. While loading Mobilenet in Raspberry takes 2.97 seconds in average and inference time is about 2.31 seconds. Which in real-time gives the following output.

Your browser doesn't support HTML5 video.

Yolo V3

There are other light deep learning networks that performs well in object detection like YOLO detection system, which model can be found on the official page. YOLOv3 is described as “extremely fast and accurate”. Which is true, because loading a model the tiny version takes 0.091 seconds and inference takes 0.2 seconds. However, from my test, Mobilenet performs a little bit better, like you can see in the following pictures.

Object detection using Tiny YoloV3

Object detection using SSD

The results of my tests can be found in the following table:

Device	Mobilenet		Yolo
	Model loading	Inference	Model loading	Inference
PC	0.5	0.19	0.091	0.2
Raspberry	2.97	2.31	0.6	3.0

Conclusion

Using deep learning models in small environments like a Raspberrypi is possible and getting close to real time measurements. The complete code can be found here for the Mobilenet model and here for yolo

OpenCV DNN module
Mobilenet SSD
Real time detection on Raspberry pi
Yolo V3
Conclusion

In relation with 🏷️ opencv, raspberrypi, python:

Real time motion detection in Raspberry Pi

In this article I show how to use a Raspberry Pi with motion detection algorithms and schedule task to detect objects using SSD Mobilenet and Yolo models.