Accelerated Face Detection on a cluster of a 8 Alveo FPGA cards

Automatic Object Detection using machine learning is one of the most promising technologies in the domain of video classification and detection. Object detection in video is computationally intensive task that requires huge amount of processing power. Hardware accelerators, based on FPGAs, can provide the required processing power to increase the throughput of the application and at the same time to reduce significantly the latency.

InAccel, a world-pioneer in the domain of FPGA-based accelerators, has released today an integrated framework that allows to utilize the power of an FPGA cluster for face detection. Specifically, InAccel has presented a demo in which a cluster of 8 FPGAs are used to provide up to 1700 fps (supporting up to 56 cameras with 30 fps in a single server).

Viola Jones face detection algorithm is a widely-used method for real-time object detection The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes, it was motivated primarily by the problem of face detection. It uses Haar-like features, which are inner products between the image and Haar templates. A face candidate is a rectangular section of the original image. As images may have faces of different sizes, an image pyramid is constructed by downscaling the image by a constant factor. This multiscale representation of image is then searched for all possible 25×25 faces. The inner product of Haar features requires the sum of different rectangular sections of the downscaled image.

Nitish Srivastava et al. from Cornell University have presented an implementation for a Xilinx Zynq device. Based on this implementation InAccel has released today an integrated framework targeting the Xilinx Alveo cards that allow to scale out the Face detection application in a cluster of 8 Alveo U200 FPGA cards providing a great performance for video applications.

FPGAs are adaptable hardware platforms that can offer great performance, low-latency and reduced OpEx for applications like machine learning, video processing, quantitative finance, genomics, etc. However, the easy and efficient deployment from users with no prior knowledge on FPGA was challenging.

InAccel provides an FPGA resource manager that allows the instant deployment, scaling and resource management of FPGAs making easier than ever the utilization of FPGAs for applications like machine learning, data processing, data analytics and many more applications. Users can deploy their application from Python, Spark, Jupyter notebooks or even terminals.

In the case of face detection, InAccel FPGA manager was used to scale out the application on a server with 8 FPGA cards. The software developers do not need to change at all the original code and the FPGA manager was used to serialize the request from the video streaming and dispatch the job to the FPGA cluster. Using 8 FPGAs, we managed to achieve up to 1700 fps on a single server. That means that a single server can support up 56 cameras (assuming 30 fps) in a single server and still the CPU processor is free for additional processing (supporting more than 56 videos assuming 30fps).

This application can be further scaled-out to multiple server through the Kubernetes plugin. For example, scaling-out to 8 servers it can support up 13,600 fps on a cluster of 8 servers (64 Alveo U200 FPGA cards). The platform was deployed in a cluster provided by VMAccel.

The platform is available for demonstration purposes. If you are interested to deploy your application with multiple FPGA cards or run your applications on the cloud, contact us at info@inaccel.com.

Applications Acceleration instantly