Infraestrutura de kernel para coleta de dados de eventos de falha no Linux
Ano de defesa: | 2024 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso embargado |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Uberlândia
Brasil Programa de Pós-graduação em Ciência da Computação |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufu.br/handle/123456789/44430 https://doi.org/10.14393/ufu.di.2024.774 |
Resumo: | Computing systems demand high reliability as they are intrinsically involved in various contexts that directly impact human activities. Failures, whether in user applications, services, or the operating system kernel, can range from minor inconveniences to disasters involving lives. Reliability is a fundamental metric to statistically quantify the level of trust one can place in software. Based on the observed importance of specific mechanisms for failure collection and analysis in systems like Windows, through the Reliability Analysis Component (RAC), the need for similar analyses for Linux was identified. For this reason, a kernel infrastructure, the Linux Reliability Analysis Component (LRAC), was created to enable the collection and storage of failure data within this operating system. This work focuses on investigating the mechanisms of General Protection Fault (GPF) and Page Fault (PF) failures and how they can be methodologically identified by LRAC. Violation conditions for x86 processors, which trigger these failures, were analyzed and applied to develop a new taxonomy aimed at making the classification of these failures more precise and less generic. A new data collection protocol was incorporated into LRAC to reflect these specificities. Subsequently, controlled tests were conducted to reproduce failure events to test and evaluate the new functionalities proposed for LRAC. The results demonstrated that distinct failure characteristics are often diagnosed generically by traditional Linux mechanisms and that the new functionalities proposed for LRAC were effective in distinguishing and classifying these differences. |