Shifted Gradient Similarity: A perceptual video quality assessment index for adaptive streaming encoding

Detalhes bibliográficos
Ano de defesa: 2016
Autor(a) principal: MONTEIRO, Estêvão Chaves
Orientador(a): FERRAZ, Carlos André Guimarães
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
Programa de Pós-Graduação: Programa de Pos Graduacao em Ciencia da Computacao
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/17359
Resumo: Adaptive video streaming has become prominent due to the rising diversity of Web-enabled personal devices and the popularity of social networks. Common limitations in Internet bandwidth, decoding speed and battery power available in such devices challenge the efficiency of content encoders to preserve visual quality at reduced data rates over a wide range of display resolutions, typically compressing to lower than 1% of the massive raw data rate. Furthermore, the human visual system does not uniformly perceive losses of spatial and temporal information, so a simple physical objective model such as the mean squared error does not correlate well with perceptual quality. Objective assessment and prediction of perceptual quality of visual content has greatly improved in the past decade, but remains an open problem. Among the most relevant psychovisual quality metrics are the many versions of the Structural Similarity (SSIM) index. In this work, several of the most efficient SSIM-based metrics, such as the Multi-Scale Fast SSIM and the Gradient Magnitude Similarity Deviation (GMSD), are decomposed into their component techniques and reassembled in order to measure and understand the contribution of each technique and to develop improvements in quality and efficiency. The metrics are applied to the LIVE Mobile Video Quality and TID2008 databases and the results are correlated to the subjective data included in the databases in the form of mean opinion scores (MOS), so each metric’s degree of correlation indicates its ability to predict perceptual quality. Additionally, the metrics’ applicability to the recent, relevant psychovisal rate-distortion optimization (Psy-RDO) implementation in the x264 encoder, which currently lacks an ideal objective assessment metric, is investigated as well. The “Shifted Gradient Similarity” (SG-Sim) index is proposed with an improved feature enhancement by avoiding a common unintended loss of analysis information in SSIM-based indexes, and achieving considerably higher MOS correlation than the existing metrics investigated in this work. More efficient spatial pooling filters are proposed, as well: the decomposed 1-D integer Gaussian filter limited to two standard deviations, and the downsampling Box filter based on the integral image, which retain respectively 99% and 98% equivalence and achieve speed gains of respectively 68% and 382%. In addition, the downsampling filter also enables broader scalability, particularly for Ultra High Definition content, and defines the “Fast SG-Sim” index version. Furthermore, SG-Sim is found to improve correlation with Psy-RDO, as an ideal encoding quality metric for x264. Finally, the algorithms and experiments used in this work are implemented in the “Video Quality Assessment in Java” (jVQA) software, based on the AviSynth and FFmpeg platforms, and designed for customization and extensibility, supporting 4K Ultra-HD content and available as free, open source code.