Detalhes bibliográficos
Ano de defesa: |
2021 |
Autor(a) principal: |
SHIH TING JU |
Orientador(a): |
Bruno Magalhaes Nogueira |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Fundação Universidade Federal de Mato Grosso do Sul
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Link de acesso: |
https://repositorio.ufms.br/handle/123456789/4032
|
Resumo: |
The large amount of data currently available is a source for extracting information for commercial and academic purposes. One approach for extracting knowledge on such bases that has gained prominence is one-class classification (OCC). The use of OCC in classifying whether an example is of a specific class is appropriate in datasets where the classes are unbalanced or where only the data of the class of interest are present during the training. Several OCC algorithms found in the literature use unsupervised clustering to delimit the border of the class of interest. These algorithms present competitive results with those presented by other OCC algorithms. Although semisupervised learning has shown the possibility to achieve better results in several areas than with unsupervised, semi-supervised clustering is still little explored for OCC. One approach for OCC is Positive and Unlabeled Learning (PUL), in which learning occurs only with positive (interest) and unlabeled data. PUL algorithms seek to find a delimitation of the positive class. This master’s degree project proposes a new algorithm PUL-SSC (Positive and Unlabeled Learning with Semi-Supervised Clustering) that learns the delimitation of the class of interest by creating and using must-link and cannot-link restrictions, clustering data with semi-supervised algorithm and a transductive learning process for label propagation. Two widely used semi-supervised clustering algorithms were employed: PCK-Means and MPCK-Means. In our experimental evaluation, semi-supervised algorithms outperformed the k-Means based algorithm and one-class SVM (OC-SVM) in most of the scenarios. In particular, the distance-based algorithm MPCK-Means was dominant in most of the comparisons using numerical and textual databases. |