Deep learning algorithms are very useful for computer vision in applications such as image classification, object detection, or instance segmentation. The main drawback is that these algorithms need in most cases graphical processing units to be trained and sometimes making predictions can require to load a heavy model. This is one of the main constrains when putting a deep learning model in production.
In this post, I will explain how to use state of the art deep learning algorithms for object detection that can run in light environments such as a Raspberrypi.
Object detection using Yolo V3
Most recent deep learning models are trained either in Tensorflow or Pytorch. Training a model requires to determine a high number of parameters, but not of them are used when doing inference (predictions). Tensorflow’s way to export a model is basically to identify the parameters that are needed for inference (graph, weights, etc) and export them in a Google optimized format called Protobuf
This protobuf model is very light compared to the original Tensorflow model and can be used for simple inference tasks. Google team released a model zoo repository with trained and optimized models that can be use for object detection applications.
Additionally, computer vision libraries like OpenCV can handle protobuf files to make predictions and remove tensorflow dependency when deploying a model. The OpenCV module that make this possible is called DNN module, which implements forward pass for deep networks.
One of the more used models for computer vision in light environments is Mobilenet. This convolutional model has a trade-off between latency and accuracy. It can be found in the Tensorflow object detection zoo, where you can download the model and the configuration files. Now I will describe the main functions used for making predictions.
First we have to load the model into memory.
readNetFromTensorflow from the DNN OpenCV module loads the Tensorflow model and
a frozen protobuf file to be used for out of the box inference.
model = cv2.dnn.readNetFromTensorflow( 'models/ssd_mobilenet/frozen_inference_graph.pb', 'models/ssd_mobilenet/ssd_mobilenet_v2_coco_2018_03_29.pbtxt')
Then to obtain (correct) predictions from the model you need to pre-process
your data. OpenCV DNN modules includes the function
creates a 4-dimensional blob from the image. It can also resize, crop an
image, subtract mean values, scale values by a given factor, swap blue and red
channels and many mode. To know more about blobs there is this good
model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True)) output = model.forward()
The output of the models corresponds to an array of size (1, 1, 100, 7). We are interested in the results of the layer [0,0], where the dimension with 100 values corresponds to the number of detected bounding boxes and 7 corresponds to the class id, the confidence score and the bounding box coordinates. We can then filter the bounding box by the confidence score.
final_detection = list() for detection in output[0, 0, :, :]: confidence = detection if confidence > THRESHOLD: final_detection.append(detection)
Loading Mobilenet in a modern laptop takes about 0.5 seconds and inference takes 0.19 seconds. While loading Mobilenet in Raspberry takes 2.97 seconds in average and inference time is about 2.31 seconds. Which in real-time gives the following output.
Your browser doesn't support HTML5 video.
There are other light deep learning networks that performs well in object detection like YOLO detection system, which model can be found on the official page. YOLOv3 is described as “extremely fast and accurate”. Which is true, because loading a model the tiny version takes 0.091 seconds and inference takes 0.2 seconds. However, from my test, Mobilenet performs a little bit better, like you can see in the following pictures.
Object detection using Tiny YoloV3
Object detection using SSD
The results of my tests can be found in the following table:
|Model loading||Inference||Model loading||Inference|
Using deep learning models in small environments like a Raspberrypi is possible and getting close to real time measurements. The complete code can be found here for the Mobilenet model and here for yolo
and share it with your friends: