An energy-efficient hardware design for the 3D-HEVC motion estimation adopting reuse strategies for data and operations

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Perleberg, Murilo Roschildt
Orientador(a): Porto, Marcelo Schiavon
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Pelotas
Programa de Pós-Graduação: Programa de Pós-Graduação em Computação
Departamento: Centro de Desenvolvimento Tecnológico
País: Brasil
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: http://guaiaca.ufpel.edu.br/handle/prefix/6089
Resumo: Currently, there is a growing demand for video streaming through the internet, and also a crescent number of portable devices capable of capture and reproduce those videos. Moreover, the 3D videos allow an improved user experience when compared with traditional videos, since in the 3D videos the scene is simultaneously captured from different points of view. However, due to the amount of data required to represent digital videos, compression techniques are mandatory, which are a series of tools responsible for reducing the redundancies present in video data. In the 3D-High Efficiency Video Coding (3D-HEVC) standard, the most complex tool is the Motion Estimation (ME), while it is also responsible for a huge part of the compression efficiency of this standard. The ME is divided in Integer ME (IME), which performs the comparison a block from the frame being encoded with several candidate blocks from already encoded frames, searching for the candidate block most similar with the block being encoded, and the Fractional ME (FME), which performs a refinement around the candidate result of the IME. By default, the 3D-HEVC adopts the Test Zone Search (TZS) algorithm to select the candidates to be evaluated in the IME, since the TZS evaluates a reduced number of candidates without result in huge losses in image quality when compared with a full search algorithm, which compares all possible candidate blocks. Also, in 3D-HEVC the IME was applied in up to 24 different block sizes. This implies several redundant operations, where parts from a specific candidate can be compared with part of the block being encoded several times, besides a huge memory communication to perform the processing of several block sizes of each candidate block. Aiming at reducing these operation redundancies, it is possible to reuse the operations performed to small block sizes to compose the result of higher block sizes. This strategy also allows data reuse, since it reduces the memory access needed to obtain the samples to process all block sizes. There have only a few works on literature proposing hardware architectures for the IME of the 3D-HEVC standard. Between the IME works of other video coding standards, only a few presents solutions considering data and operations reuse strategies and a fast algorithm as TZS. Therefore, this work presents an IME hardware design adopting the TZS algorithm, with support to all block sizes supported by 3D-HEVC standard and with operations and data reuse strategies to take advantage of already processed operations and reduce the memory communication. The 3D-HEVC IME algorithm was modified aiming at an efficient hardware implementation, and the evaluations show that these modifications present an increase of 9.016% in the BD-rate metric. The developed IME architecture was synthesized for an ASIC with TSMC 40nm standard cells technology, and the results show that the hardware requires 269 K gates, while dissipates 108.48 mW when processing 3 views from different cameras, where each view is composed by the video of the two channels (Texture and Depth Maps) with FHD 1920x1080p resolution with 30 frames per second. The synthesis results have also indicated that the IME hardware design can process up to 3 views with UHD 3840x2160p resolution with 60 frames per second. Moreover, an FME architecture was also presented, which is able to evaluate all possible 48 fractional blocks around the IME result.