FD-Sensi: um detector de falhas adaptativo e sua aplicação a um sistema distribuído em larga escala

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Everthon Valadao dos Santos
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/SLSS-7XGFF4
Resumo: The failure detector is an essential component of any distributed dependable system solution. An ideal failure detector must adapt to varying network/system conditions in order to provide fast and accurate information about faulty nodes to other modules of a dependable system. This work presents a new adaptive failure detection algorithm, FD-Sensi, which is able to cope with heavily loaded distributed systems and networks in a wide range of message delay scenarios. We evaluated our failure detector algorithm in an Internet scenario, using data collected in one hundred PlanetLab nodes. The data were used to compare the performance of our algorithm with one of the best failure detection algorithms of the present day, Adaptive Accrual. Our results show FD-Sensi outperformed Adaptive Accrual, presenting a significant reduction in the emission of false-positives with the maintenance of a low average detection time. The trace collected on PlanetLab may be used in the evaluation of new algorithms for failure detection and through its analysis this work also provides the best fitted statistical distributions to model network delays in globally distributed environments. Finally, we propose a technique for improving detection algorithms that, based on the correlation between the resource load of the monitored node and the observed delays allowed us to significantly improve the accuracy and speed of failure detection.