As FPGAs gain prominence with the cross-industry penetration of cognitive technology, employing High Level Synthesizers as the electronic design automation tool is a smart way to overcome the limitations of FPGA development
Shaji N.M., Principal Architect at QuEST Global
Since they were first introduced in the 1980’s, Field Programmable Gate Arrays (FPGAs) have suffered from an existential crisis – they are adaptable in the extreme, but difficult to program. Wary of the longer development time and effort, designers have more often than not, forsaken the numerous virtues of FPGAs – versatility and superior performance being the most prevalent.
Over the past few years however, with increase in computing power and favourable changes in logic architecture, FPGAs are making faster inroads in the electronic industry, particularly with the rapid proliferation of cognitive technology like computer vision, deep learning, analytics and IoT.
In QuEST’s experience, employing High Level Synthesizers as the EDA tool for programming FPGAs dramatically increases engineer productivity, cuts development costs and greatly reduces time to market, particularly in situations where the project demands a shorter turn-around time.
The productivity challenges in FPGAs
The present day FPGAs are big enough to hold an entire digital system within. Even though they have very high processing capability, low power consumption and high I/O Bandwidth, many architects do not opt FPGAs. The major reason attributed for this is the higher productivity for software development as against FPGA development. Productivity, in this context, implies the amount of work completed by an engineer per day. There are several factors that lower productivity relative to software, a few listed here.
- Software language is sequential in nature and there is only one instruction executing at any given point of time. But in FPGA development, the designer has to be aware of hundreds of activities happening simultaneously.
- Software runs on a fixed proven hardware and the maximum performance is limited by hardware. In FPGA, the hardware optimized for the application has to be designed, which consumes more effort.
- Software simulations are several times faster than RTL simulations.
- Development Tools and Coding Standards are more mature and robust for software than FPGA.
- FPGA synthesis and implementation takes much longer time than software compiling.
- Debugging FPGA takes more time since all the events occurring at each clock cycle and their interdependencies need to be evaluated to fix the bug, which is time consuming.
As the FPGAs grew in size, the number of lines to be written increased tremendously and lower productivity for FPGA development against software development became more prominent. Further, in many domains like computer vision and machine learning, the algorithms have been rapidly evolving and this has created a thumping need for reducing the turnaround time.
The advantage of High Level Synthesizers
Digital Hardware can be designed at various levels of abstraction. Currently, RTL level of abstraction is the most widely used and has become the de-facto technique for FPGA design. At RTL level, the designer can describe the behavior of the module and also perceive the hardware that will be inferred by the code.
On the other hand, HLS elevates the abstraction level from RTL to the algorithmic level written in C/C++ code. Use of high level C language instead of RTL by FPGA designers is akin to software designers using high-level programming languages (C/C++) instead of assembly language.
Using HLS, from one algorithmic description a variety of hardware micro architectures can be generated based on tool directives. The information required for the compiler on how to infer the hardware is given through pragmas. Pragmas are available to control pipelining, loop optimizations, array optimizations, managing interfaces etc.
Using algorithmic representation and pragma based micro-architecture inference, HLS improves engineer productivity, which FPGA is ill-known for. The engineering effort required for converting algorithm into implementable design and developing micro-architecture is saved. It shortens the development schedule and thereby improves the time-to-market.
QuEST’s Experience in using High Level Synthesizers
There are many applications where software developed for a processor/GPU has to be implemented in FPGA for performance enhancement. Many of the video processing chains for computer vision and machine learning fall into this category and are suited for High Level Synthesis.
For well written hardware friendly codes, the time taken to implement a C-code using HLS is much less than writing the corresponding RTL. However, the conversion is not always automatically taken care by the tool. Often it requires manual intervention and hardware knowledge to convert a typical C-code written by an algorithm engineer to an HLS compatible C-code.
Based on our experience in converting a 30 kernel signal processing chain in CUDA to FPGA, adopting HLS based approach can reduce the schedule/effort upto one-third of what would have taken otherwise. However, the workflow for HLS based development varies considerably from that of RTL based development. The 30-kernel experience provided new insights into HLS based development, helping QuEST to develop an optimized workflow based on the learnings.
According to major FPGA vendors, the time taken to implement a functionality in HLS is much less than implementing in RTL. They also indicate that the performance of some HLS modules may be poor, but close to their HDL counterparts. Our experience also corroborates this claim. By using HLS, we were able to reduce the development time and effort to around one-third, thereby effectively tripling the engineer productivity.
FPGAs are making a strong presence in high performance systems. Many of the algorithms which previously run on processors/DSP/GPU are currently being retargeted for FPGA. System designers are looking for quicker ways to perform this conversion. High Level Synthesizer is a tool that converts a C-code to IP Core or RTL with minimum intervention from designer. It improves the FPGA engineer productivity several times and reduces the time-to-market as is corroborated by our experience on FPGA implementation using HLS. In machine learning and computer vision applications where performance is important, FPGA implementation using HLS tool is being used by designers to improve the overall performance. Though the tool is not widely adopted by the industry as of now, it is sure to find many takers in the next few years, leading to a dramatic increase in productivity for FPGA development.