Uma nova abordagem para aprendizado de múltiplas instâncias, baseada em seleção de instâncias via estimador de densidade por Kernel

Detalhes bibliográficos
Ano de defesa: 2016
Autor(a) principal: Alexandre Wagner Chagas Faria
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUBD-ADLMQR
Resumo: Multiple Instance Learning (MIL) is a generalization of the supervised learning. MIL has been used in numerous applications where the instance labeling for individual instance, for the learning step, is sometimes not possible or unfeasible in practical way. For dealing with this family of problem, MIL proposes a new paradigm by assigning asingle label (positive or negative) to a set of instances, called bag. More formally, a bag is labeled positive if it contains at least one positive instance, and it is labeled negative if all instances are certainly negative.Although there is a considerable number of algorithms to work with MIL in the literature, few works provides balanced outcomes for the majority of the datasets. Furthermore, a deeper analysis, among the existing methods, is not available. In this work are proposed two new algorithms based on instance selection by likelihood computation, using Kernel Density Estimation. The approach uses the LogitBoost algorithmas classier. The instance selection approach aim to identify the most representative instances in each positive bag, eliminating possible instance noise inside those bags, in this way, perform a more robust learning step. Statistical tests, have demonstrated that the proposal methods are comparable with the best literature algorithms, overcoming all in some datasets. It is also developed in this work a new application based on the proposed method in order to select patients that best represent each class in a Leukemia dataset. After experiments, itwas possible to reduce the training patients by half, and nd slightly better results than those when is used all patients in the dataset.