Ph.D. Research

In short, my research was about accelerating automated design space exploration of high-level synthesis applications on reconfigurable platforms.

That certainly is a mouthful… Let’s see what this means.

Accelerating an software application using custom hardware used to be a time-consuming process. Developers commonly used register-transfer level (RTL) languages like Verilog and VHDL. Although one has precise command over the exact hardware that is generated, it takes months to write a complex hardware component such as a processor. To illustrate the verbosity of the language, a description of a module with just one bit of memory is given below.

architecture behav of reg is begin
  signal d: std_logic;
  signal q: std_logic;
begin
  process (clk) begin
    if rising_edge(clk) then
      q <= d;
    end if;
  end process;
end behav;

Thanks to advances in technology, writing in RTL languages is fortunately rarely necessary anymore. High-level synthesis (HLS) tools like Vivado HLS from Xilinx allow us to write code in C or C++. Since many software developers are already familiar with C and C++, you don’t have to be a hardware expert anymore to write hardware! You write an application in C or C++. You can compile and test it using generic C/C++ compilers such as GCC and Visual Studio. Similarly, debugging can be done with existing tools like GDB.

Once you are satisfied with the functionality, you run SDSoC on the code. SDSoC (nowadays Vitis) compiles the code into hardware and software. Under the hood, SDSoC employs Vivado HLS to generate the hardware. The generated software and hardware are suitable for a heterogeneous platform. Heterogeneous refers to the fact that the hardware consists of dissimilar components, namely microprocessors and programmable logic.

You may wonder what programmable logic is. A regular microchip has fixed purpose. In case of a microprocessor, the purpose is to execute programs. Now, let’s say we want to use it for a different purpose, say to store large amounts of data like a memory. That is impossible with a regular chip. Once it has left the factory, it cannot be altered to provide different functionality. Every computer chip, from GPU to DRAM, consists of millions of smaller components, gates or logic, that are connected by wires. If we could change the wires, we could turn a microprocessor into a memory. That is where programmable logic comes in. The simple wires between the gates are replaced with wires that can be connected and disconnected electronically. We merely have to reprogram the programmable logic. Chips with reconfigurable logic are commonly called field-programmable gate arrays (FPGAs). Adding programmability does not come for free. FPGAs are slower than dedicated hardware. A typical microprocessor runs at 3 GHz. these days, but programmed on an FPGA, frequencies of 800 MHz are even hard to achieve.

Although compiling code without modifications results in functional hardware, this hardware is usually not satisfactory because all operations are executed in series, similar to a microprocessor, resulting in a poor performance. The real benefit of an FPGA becomes apparent when we insert pragmas in the code. A regular software compiler will ignore unrecognized pragmas in accordance with the C/C++ standards. Vivado HLS will use these same pragmas to select a different hardware implementation.

An example of a pragma is the unroll pragma. It controls how much a loop is unrolled. Unrolled loop iterations can be executed in parallel, thereby reducing the runtime of the application. Why do we not unroll every loop if it benefits the performance? Because unrolling also increases the number of FPGA resources that are used. An FPGA has a limited number of physical resources, so too much unrolling prevents Vivado HLS from generating a feasible hardware description.

In addition to the unroll pragma, there are other pragmas. The pipeline pragma, for instance, generates a pipeline of hardware units. This results again in more parallelism and a better performance. However, we also have to watch out not to run out of resources again. Each pragma presents a tradeoff between performance and resource consumption. This makes selecting the right pragmas for a large design complex. And that is not even where our choices end… The hardware tools have command line options that affect the compilation too.

By now, you probably wonder where this story is leading. In my research, I try to select the right the optimal pragmas, command-line options, and other design choices one might encounter, for a design targeted at a given FPGA. In an abstract way, we can represent each choice by a variable, the design parameter. Together, all design parameters describe a multidimensional space, the design space. Each point in this space represents one design instance. Locating the best design is called design space exploration. Which design is best depends on the objective set by the developer, but it is often minimizing latency or maximizing throughput while ensuring that resource consumption remain within prescribed limits.

Design spaces can easily encompass billions of points because the number of points increases exponentially as the number of design parameters increases. What makes design space exploration even more challenging is that building hardware for a single design can easily take hours in SDSoC. Hence, it becomes clear that we need an automated solution that selects new points in an intelligent way to minimize the time needed to find the best design. This automated design space exploration is also called autotuning.

Autotuning compilation of HLS applications is not novel. However, we believe that there is room to make autotuning faster. The goal of my research is to reduce the time needed to discover a design that satisfies given requirements.

Leave a Reply