Lina: a fast design optimisation tool for software-based FPGA programming

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Perina, Andre Bannwart
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-23082022-101507/
Resumo: The continuous technology push on the semiconductor industry has led to the development of several alternate architectures for efficient computing. Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) are examples of devices used to accelerate applications. FPGAs are able to provide massive parallelism for suitable tasks when properly programmed. However, designing for FPGA is non-trivial and requires specific knowledge that deviates from the usual software programming. As an alternative towards increasing programmability, High-Level Synthesis (HLS) tools allow high-level languages such as C/C++/OpenCL to be used as input for FPGA design. However, early experiments and other studies in the literature demonstrate that significant code modification is still necessary so that the results are minimally acceptable. This aspect mitigates the democratisation and simplification that HLS tools seek to achieve. The major contribution of this thesis works on the C/C++ level, composed of a design space exploration tool that uses an estimator named Lina. Based on Lin-analyzer, Lina uses a traced execution of a software code to approximate the compilation behaviour of Vivado HLS, a C/C++ HLS compiler for Xilinx FPGAs. For a given C/C++ kernel, Lina provides a fast approximation of metrics such as execution time and FPGA resources occupied. Along with HLS compiler optimisation directives that Lina supports in its estimation, our exploration method allows the optimisation of not only execution time, but also FPGA resource usage. We then used Lina to optimise 16 C/C++ kernels from the PolyBench benchmark, and the estimated optimal solutions were among the 1% best options. An average of 14-16× performance speedup was achieved, accounting for 70% of the reachable speedup when considering the traversed design spaces. Additionally, Lina allows the exploration of off-chip memory transactions in search of optimisations such as coalescing, data packing, or to inform about potential HLS compiler limitations that could degrade performance.