Accelerated Inference on FPGA clusters using InAccel: MLPerf results

Evaluating the best hardware platform for your deep learning application can sometimes be challenging. Also, the marketing numbers provided by several companies sometimes can be misleading as they refer to specialized benchmarks. MLPerf’s mission is to build fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services. A widely accepted benchmark suite benefits the entire community, including developers, manufacturers, machine learning engineers, application providers, and end users. MLPerf provides a standard based benchmark that allows the performance comparison of several hardware platforms.

MLPerf 0.7 Inference results on a cluster of 2 Alveo U250 FPGA cards using InAccel orchestrator

InAccel has released today the results for the Deep Learning inference on MLPerf using a cluster of Xilinx Alveo FPGA U250 cards. InAccel through the unique FPGA orchestrator is the only company that has released the results on inference for a cluster of FPGAs and not only for single instance FPGAs. Using the Quantized ResNet50 Dataflow engine from Xilinx Research Labs, InAccel has integrated the engine to the FPGA orchestrator allowing the scaling of the inference engine to multiple FPGA cards.

InAccel’s orchestrator allows easy deployment, scaling, resource management, and task scheduling for FPGAs making it easier than ever, the deployment and the utilization of FPGA for Deep Learning Inference.

The orchestrator allows simpler integration with high-level framework and allows easy deployment, sharing from multiple users and automatic resource management. Using the InAccel orchestrator the ResNet50 inference engine can be scaled out to 16 FPGA cards per server. InAccel released today the latest results for MLPerf that can achieve up to 6008 fps on a cluster of 2 Alveo U250 cards (Offline evaluation). It also allows reduced latency in applications where latency is critical. In the case of Single-Stream scenario the latency drops to as low as 7 ms making it ideal for applications where latency is of paramount importance.

Users can evaluate for free the available Inference Accelerator through the unique InAccel studio. Using a familiar Jupyter-based portal, users can evaluate the inference engine and test it with their own datasets.

Interested user can contact InAccel at on how they can use the inference engine on a cluster of FPGAs.