Detecção de clusters espaciais através de otimização multiobjetivo

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Andre Luiz Fernandes Cancado
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/RHCT-7U55SC
Resumo: Irregularly shaped spatial disease clusters occur commonly in epidemiological studies, but their geographic delineation is poorly defined. Most current spatial scan software usually displays only one of the many possible cluster solutions with different shapes, from the most compact round cluster to the most irregularly shaped one, corresponding to varying degrees of penalization parameters imposed to the freedom of shape. Even when a fairly complete set of solutions is available, the choice of the most appropriate parameter setting is left to the practitioner, whose decision is often subjective. We propose quantitative criteria for choosing the best cluster solution, through multi-objective optimization, by finding the Pareto-set in the solution space. Two competing objectives are involved in the search: regularity of shape, and scan statistic value. Instead of running sequentially a cluster finding algorithm with varying degrees of penalization, all solutions are found in parallel, employing a genetic algorithm. The method is fast, with good power of detection. The introduction of the concept of Pareto-set in this problem, followed by the choice of the most significant solution, is shown to allow a rigorous statement about what is a best solution, without the need of any arbitrary parameter. The cluster significance concept is extended for this set in a natural way through the use of the attainment function, being employed as a decision criterion for choosing the optimal solution. The Gumbel and Weibull models are used to approximate the empirical scan statistic distribution, speeding up the significance estimation. The multi-objective methodology is compared with the single-objective genetic algorithm. An application to breast cancer cluster detection is discussed. Finally, a knapsack approach is proposed for a relaxed version of the problem, allowing an upper bound to be obtained, in contrast with the lower bounds obtained by the genetic algorithm.