Dynamic fault tolerant mechanism for memory controllers

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Stefani, Marco Pokorski
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Pontif?cia Universidade Cat?lica do Rio Grande do Sul
Escola Polit?cnica
Brasil
PUCRS
Programa de P?s-Gradua??o em Ci?ncia da Computa??o
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://tede2.pucrs.br/tede2/handle/tede/11400
Resumo: Memory errors can cause failures, security vulnerabilities, corruption, and data loss, which are unacceptable for server systems. These problems push the construction of a robust computing memory architecture design. Memory controllers can mitigate these errors by employing an Error Correction Code (ECC) in the data write and read flows. Environmental and technological factors imply different error probabilities, preventing defining at design time which ECC is most effective and efficient to be used. This work proposes a fault-tolerant mechanism acting as a memory controller encoding and decoding manager. This mechanism dynamically defines the ECC for each memory block, following as criteria the error rate captured at runtime and the ECCs efficacy implemented in the controller. Memory blocks with a high error rate can be recoded to a high efficacy ECC and vice versa. Experimental results show that our proposal achieves high error correction efficacy with high energy efficiency. Additionally, we developed the Absimth tool to analyze the efficacy and efficiency of the proposal that employs dynamic fault tolerance management mechanisms. Absimth enables hardware/software modeling and verification in various granularity levels, from in-memory applications to the operating system, including encoding and decoding processes that employ ECCs, enabling comparing the efficacy and efficiency of the proposed solutions in uncountable scenarios