Classificação associativa sob demanda

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Adriano Alonso Veloso
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/SLSS-7WFMGG
Resumo: The ultimate goal of machines is to help humans to solve problems. The solutions for such problems are typically programmed by experts, and the machines need only to follow the specified steps to solve the problem. However, the solution of some problems may be too difficult to be explicitly programmed. In such difficult cases, instead of directly programming machines to solve the problem, machines can be programmed to learn the solution. Machine Learning encompasses techniques used to program machines to learn. It is one of the fastest-growing research areas today, mainly motivated by the fact that the advent of improved learning techniques would open up many newuses for machines (i.e., problems for which the solution is hard to program by hand). A prominent approach to machine learning is to repeatedly demonstrate how the problem is solved, and let the machine learn by example, so that it generalizes some rules about the solution and turn these into a program. This process is known as supervised learning. Specifically, the machine takes matched values of inputs (instantiations of the problem to be solved) and outputs (the solution) and absorb whatever information their relation contains in order to emulate the true mapping of inputs to outputs. When outputs are drawn from a pre-specified and finite set of possibilities, the process is known as classification, which is a major data mining task. Some classification problems are hard to solve, and motivate this thesis. The keyinsight that is exploited in this thesis is that a difficult problem can be decomposed into several much simpler sub-problems. This thesis is to show that, instead of directly solving a difficult problem, independently solving its sub-problems by taking into account their particular demands, often leads to improved classification performance. This is shown empirically, by solving real-world problems (for which the solutions are hard to program) using the computationaly efficient algorithms that are presented in this thesis. These problems include categorization of documents and name disambiguation in digital libraries, ranking documents retrieved by search engines, protein functional analysis, revenue optimization, among others. Improvements in classification performance are reported for all these problems (in some cases with gains of more than 100%). Further, theoretical evidence supporting our algorithms is also provided.