Using deep neural networks for failure prediction in hard disk drives

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Lima, Fernando Dione dos Santos
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/49628
Resumo: Hard disk drives (HDDs) are still the most widely used storage technology employed in large-scale storage systems. This is mainly a result of its excellent cost-benefit relation in terms of cost per gigabyte. Several research efforts have been done to propose early failure detection techniques for these devices in order to improve storage systems availability and avoid data loss. Failure prediction in such circumstances would allow for the reduction of downtime costs through anticipated disk replacements, as well as the migration of data to new devices, avoiding data loss. Many of the techniques proposed so far mainly perform incipient failure detection thus not allowing for proper planning of such maintenance tasks. In this work, we present several remaining useful life (RUL) estimation approaches for hard disk drives based on SMART parameters. These approaches include two different modelings of the problem. The first is a traditional regression-based that allows for fine-grained predictions. The other approach allows for a greater control over the granularity needed by the systems operator, through a previous configuration, and consists in the modeling of the problem as a multiclass, or multinomial, classification task. In the context of the classification problem, we also explore two important aspects of the RUL estimation task: the ordinality between classes, and the predictive bias towards classes that indicate a reduced device lifetime, when an incorrect prediction takes place. All models are based on Deep Neural Networks (DNNs). For evaluating the models, a dataset produced by a cloud storage service provider, and including the complete time-series for 1,697 failing devices, was employed. Experiments showed that the proposed methods produced satisfying results in the regression-based task when assessed with metrics well-suited and designed specifically for prognostics tasks. In the modeling as a traditional classification task, our model produced superior results to the baseline model and, in the asymmetric and ordinal classification task, it outperformed baselines in metrics where both ordinality and asymmetry were taken into account.