Uma abordagem de detecção automática de vandalismo na Wikipédia utilizando aprendizado associativo ativo
Ano de defesa: | 2012 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/ESBF-8VMJ98 |
Resumo: | Wikipedia and other free editing services for collaboratively generated content have quickly grown in popularity. However, the lack of editing control has made these services vulnerable to various types of malicious actions such as vandalism. State-of-the-art vandalism detection methods are based on supervised techniques, and thus rely on the availability of large and representative training collections. Building such collections,often with the help of crowdsourcing, is quite costly, as it has to deal with a natural skew towards very few vandalism examples in the available data and dynamic patterns. Aiming at reducing the cost of building such collections, we present a new active sampling technique coupled with an on-demand associative classification algorithm for Wikipedia vandalism detection. We first show that the associative classification enhanced with a simple undersampling technique for building the training set outperforms state-of-the-art classifiers such as SVMs and kNNs, and is competitive with the best results of the CLEF competition on Wikipedia vandalism detection. Furthermore, by applying the active sampling approach, we are able to reduce the need for training in almost 96% with only a small impact on detection results, thus making our solution very practical for real scenarios. |