Aplicação de CNN e LLM na Localização de Defeitos de Software

Basílio Neto, Altino Dantas

Aplicação de CNN e LLM na Localização de Defeitos de Software

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Basílio Neto, Altino Dantas
Orientador(a):	Camilo Júnior, Celso Gonçalves
Banca de defesa:	Camilo Junior, Celso Gonçalves, Leitão Júnior , Plínio de Sá, Oliveira, Sávio Salvarino Teles de, Vincenzi, Auri Marcelo Rizzo, Souza, Jerffeson Teixeira de
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Goiás
Programa de Pós-Graduação:	Programa de Pós-graduação em Ciência da Computação (INF)
Departamento:	Instituto de Informática - INF (RMG)
País:	Brasil
Palavras-chave em Português:	Localização de defeitos Redes Neurais Artificiais Redes Neurais Convolucionais Modelos de Linguagem de Grande Porte
Palavras-chave em Inglês:	Fault Localization Artificial Neural Network Convolutional Neural Networks Large Language Model
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.bc.ufg.br/tede/handle/tede/13794
Resumo:	The increase in the quantity or complexity of computational systems has led to a growth in the occurrence of software defects. The industry invests significant amounts in code debugging, and a considerable portion of the cost is associated with the task of locating the element responsible for the defect. Automated techniques for fault localization have been widely explored, with recent advances driven by the use of deep learning models that combine different types of information about defective source code. However, the accuracy of these techniques still has room for improvement, suggesting open challenges in the field. This work aims to formalize and investigate the most impactful aspects of fault localization techniques, proposing a framework for characterizing approaches to the problem and two solution methodologies: a) based on convolutional neural networks (CNNs) and b) based on large language models (LLMs). From experimentation involving public datasets in Java and Python, it was demonstrated that CNNs are comparable to traditional methods but were found to be inferior to other methods in the literature. The LLM-based approach, on the other hand, greatly outperformed heuristics like Ochiai and Tarantula and proved competitive with more recent literature. An experiment in a scenario free from the data leakage problem showed that LLM-based approaches can be improved by combining them with the Ochiai heuristic.

Aplicação de CNN e LLM na Localização de Defeitos de Software

Registros relacionados