Deep learning algorithms are very useful for computer vision in applications such as image classification, object detection, or instance segmentation. The main drawback is that these algorithms need in most cases graphical processing units to be trained and sometimes making predictions can require to load a heavy model. These is one of the main constrains when putting a deep learning model in production.
In this post, I will explain how to use state of the art deep learning algorithms for object detection that can run in light environments such as a Raspberrypi.
Object detection using Yolo V3
OpenCV DNN module
Most recent deep learning models are trained either in Tensorflow or Pytorch. Training a model requires to determine a high number of parameters, but not of them are used when doing inference (predictions). Tensorflow way to export a model is basically to identify the parameters that are need for inference (graph, weights, etc) and export them in a Google optimized format called Protobuf
These protobuf models are very light compared to the original tensorflow model and can be used for simple inference. Google team released a model zoo repository with trained and optimized models that can be use for object detection applications.
Additionally, computer vision libraries like OpenCV can handle protobuf files to make predictions and remove tensorflow dependency when deploying a model. The OpenCV module that make this possible is called DNN module, which implements forward pass for deep networks.
One of the more used models for computer vision in light environments is Mobilenet. This convolutional model as the trade-off between latency and accuracy. This model can be found in the tensorflow object detection zoo, where you can download model and the configuration files.
Load tensorflow model
readNetFromTensorflow from the DNN OpenCV module loads the tensorflow model and
a frozen protobuf file to be used for out of the box inference.
model = cv2.dnn.readNetFromTensorflow( 'models/ssd_mobilenet/frozen_inference_graph.pb', 'models/ssd_mobilenet/ssd_mobilenet_v2_coco_2018_03_29.pbtxt')
Then to obtain (correct) predictions from the model you need to preprocess your
data. OpenCV DNN modules includes the function
blobFromImage which creates a
4-dimensional blob from the image. Optionally resizes and crops image from
center, subtract mean values, scales values by scalefactor, swap Blue and Red
channels. To know more about blobs.
model.setInput(cv2.dnn.blobFromImage(image, size=(300, 300), swapRB=True)) output = model.forward()
The output of the models corresponds to an array of size (1, 1, 100, 7). We are interested in the results of the layer [0,0], where the 100 corresponds to the number of detected bounding boxes and 7 corresponds to the class id, the confidence score and the bounding box coordinates. We can then filter the bounding box that have a minimum confidence score.
final_detection = list() for detection in output[0, 0, :, :]: confidence = detection if confidence > THRESHOLD: final_detection.append(detection)
Real time detection on Raspberry pi
Loading Mobilenet in a modern laptop takes about 0.5 seconds and inference takes 0.19 seconds. While loading Mobilenet in Raspberry takes 2.97 seconds in average and inference time is about 2.31 seconds. Which is realtime gives the following output.
Your browser doesn't support HTML5 video.
There are other light deep learning networks that performs well in object detection like YOLO detection system, which model can be found on the official page. YOLOv3 is described as “extremely fast and accurate”. Which is true, because loading a model the tiny version takes 0.091 seconds and inference takes 0.2 seconds. However, from my tests Mobilenet performs a little bit better, like you can see in the following pictures.
Object detection using Tiny YoloV3
Object detection using SSD
The results of my tests can be found in the following table:
|Model loading||Inference||Model loading||Inference|