Study and optimization for high performance processing with GALPHAT

Bibliographic Details
Main Author: Igor Kolesnikov
Publication Date: 2020
Format: Master thesis
Language: eng
Source: Biblioteca Digital de Teses e Dissertações do INPE
Download full: http://urlib.net/sid.inpe.br/mtc-m21c/2020/04.20.16.10
Summary: The parametric computational modeling of galaxies is a process with a high computational cost. The statistical component of modeling, which may involve model refinements in relation to the source brightness distribution achieves more satisfactory results when the approach is Bayesian. In this research, we are using GALaxy PHotometric ATtributes (GALPHAT) as our main tool for data processing. The GALPHAT modeling of a galaxy observed by the Sloan Digital Sky Survey (SDSS) can last about 6 hours. In the current scenario of cosmology, this type of modeling, to be scientifically effective, must be performed on a set containing about thousands of objects. The sample analyzed within the scope of the FAPESP thematic project that LABAC participates contains more than 24,309 objects, an amount that demands the use of high-performance computing (HPC) to enable effective modeling of the entire sample. In this postgraduate project, we have as the main objective to study and optimize HPC solutions that allow GALPHAT processing on a SDSS sample in the fastest possible way. For this, we have two HPC systems that can work in a coordinated way to optimize the modeling strategies. The first system belongs to LABAC and is based on Intel Xeon Phi 7250 platform. The second system belongs to the partition of the multi-core platform of the Santos Dumont supercomputer. The research, therefore, includes the initial process done to set up and run GALPHAT on both platforms, thus using different types of processors and compilers. Considering the different processing steps, in different modeling strategies we applied refactoring and complete modules rewriting. Our studies have found the optimal combination of software, hardware and optimizations to minimize processing time. This is the first step in implementing and integrating the graphical user interface to make GALPHAT easier to use. This dissertation, therefore, presents all of the activities that were performed to allow, as a final result, to process, in a timely manner, via HPC, the entire selected sample including the description of benchmark among the computational systems used. It includes the development of the auxiliary visualization system as well.
id INPE_ccb38a39da5265d28cd98ba277254d57
oai_identifier_str oai:urlib.net:sid.inpe.br/mtc-m21c/2020/04.20.16.10.04-0
network_acronym_str INPE
network_name_str Biblioteca Digital de Teses e Dissertações do INPE
spelling info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisStudy and optimization for high performance processing with GALPHATEstudo e otimização para processamento de alto desempenho com GALPHAT2020-04-23Celso Luiz MendesReinaldo Roberto RosaGilberto Ribeiro de QueirozIrapuan Rodrigues de Oliveira FilhoIgor KolesnikovInstituto Nacional de Pesquisas Espaciais (INPE)Programa de Pós-Graduação do INPE em Computação AplicadaINPEBRcomputational cosmologyelliptical galaxiesBayesian statisticsgalaxies structure and environmenthigh performance computingcosmologia computacionalgaláxias elípticasestatística Bayesianagaláxias estrutura e ambientecomputação de alta performanceThe parametric computational modeling of galaxies is a process with a high computational cost. The statistical component of modeling, which may involve model refinements in relation to the source brightness distribution achieves more satisfactory results when the approach is Bayesian. In this research, we are using GALaxy PHotometric ATtributes (GALPHAT) as our main tool for data processing. The GALPHAT modeling of a galaxy observed by the Sloan Digital Sky Survey (SDSS) can last about 6 hours. In the current scenario of cosmology, this type of modeling, to be scientifically effective, must be performed on a set containing about thousands of objects. The sample analyzed within the scope of the FAPESP thematic project that LABAC participates contains more than 24,309 objects, an amount that demands the use of high-performance computing (HPC) to enable effective modeling of the entire sample. In this postgraduate project, we have as the main objective to study and optimize HPC solutions that allow GALPHAT processing on a SDSS sample in the fastest possible way. For this, we have two HPC systems that can work in a coordinated way to optimize the modeling strategies. The first system belongs to LABAC and is based on Intel Xeon Phi 7250 platform. The second system belongs to the partition of the multi-core platform of the Santos Dumont supercomputer. The research, therefore, includes the initial process done to set up and run GALPHAT on both platforms, thus using different types of processors and compilers. Considering the different processing steps, in different modeling strategies we applied refactoring and complete modules rewriting. Our studies have found the optimal combination of software, hardware and optimizations to minimize processing time. This is the first step in implementing and integrating the graphical user interface to make GALPHAT easier to use. This dissertation, therefore, presents all of the activities that were performed to allow, as a final result, to process, in a timely manner, via HPC, the entire selected sample including the description of benchmark among the computational systems used. It includes the development of the auxiliary visualization system as well.A modelagem computacional paramétrica de galáxias é um processo com alto custo computacional. O componente estatístico da modelagem, que pode envolver refinamentos do modelo em relação à distribuição do brilho da fonte, obtém resultados mais satisfatórios quando a abordagem é bayesiana. Nesta pesquisa, estamos usando o GALaxy PHotome-tric ATtributes (GALPHAT) como nossa principal ferramenta para processamento de dados. A modelagem usando o GALPHAT de uma galáxia observada pelo Sloan Digital Sky Survey (SDSS) pode durar cerca de 6 horas. No cenário atual da cosmologia, esse tipo de modelagem, para ser cientificamente eficaz, deve ser realizado em um conjunto contendo milhares de objetos. A amostra analisada dassa pesquisa, que faz parte de projeto temático da FAPESP da qual o LABAC participa contém mais de 24.309 objetos, quantidade que exige o uso do processamento de alto desempenho (PAD) para permitir a modelagem eficaz de toda a amostra. Neste projeto de pós-graduação, temos como principal objetivo estudar e otimizar soluções de PAD que permitem o processamento com GALPHAT em uma amostra de SDSS da maneira mais rápida possível. Para isso, temos dois sistemas PAD que podem funcionar de maneira coordenada para otimizar as estratégias de modelagem. O primeiro sistema pertence ao LABAC e é baseado na plataforma Intel Xeon Phi 7250. O segundo sistema pertence à partição da plataforma multinúcleo do supercomputador Santos Dumont. A pesquisa, portanto, inclui o processo inicial feito para configurar e executar o GALPHAT nas duas plataformas, usando diferentes tipos de processadores e compiladores. Considerando as diferentes etapas de processamento, nas diferentes estratégias de modelagem, aplicamos a refatoração e a reescrita completa dos módulos de pipeline. Nossos estudos descobriram a combinação ideal de software, hardware e otimizações para minimizar o tempo de processamento. Este é o primeiro passo na implementação e integração da interface gráfica do usuário para facilitar o uso do GALPHAT. Esta dissertação, portanto, apresenta todas as atividades realizadas para permitir, como resultado final, processar em tempo hábil, via PAD, toda a amostra selecionada, incluindo a descrição de uma referência entre os sistemas computacionais utilizados. Inclui também o desenvolvimento do sistema de visualização auxiliar.http://urlib.net/sid.inpe.br/mtc-m21c/2020/04.20.16.10info:eu-repo/semantics/openAccessengreponame:Biblioteca Digital de Teses e Dissertações do INPEinstname:Instituto Nacional de Pesquisas Espaciais (INPE)instacron:INPE2021-07-31T06:56:18Zoai:urlib.net:sid.inpe.br/mtc-m21c/2020/04.20.16.10.04-0Biblioteca Digital de Teses e Dissertaçõeshttp://bibdigital.sid.inpe.br/PUBhttp://bibdigital.sid.inpe.br/col/iconet.com.br/banon/2003/11.21.21.08/doc/oai.cgiopendoar:32772021-07-31 06:56:18.335Biblioteca Digital de Teses e Dissertações do INPE - Instituto Nacional de Pesquisas Espaciais (INPE)false
dc.title.en.fl_str_mv Study and optimization for high performance processing with GALPHAT
dc.title.alternative.pt.fl_str_mv Estudo e otimização para processamento de alto desempenho com GALPHAT
title Study and optimization for high performance processing with GALPHAT
spellingShingle Study and optimization for high performance processing with GALPHAT
Igor Kolesnikov
title_short Study and optimization for high performance processing with GALPHAT
title_full Study and optimization for high performance processing with GALPHAT
title_fullStr Study and optimization for high performance processing with GALPHAT
title_full_unstemmed Study and optimization for high performance processing with GALPHAT
title_sort Study and optimization for high performance processing with GALPHAT
author Igor Kolesnikov
author_facet Igor Kolesnikov
author_role author
dc.contributor.advisor1.fl_str_mv Celso Luiz Mendes
dc.contributor.advisor2.fl_str_mv Reinaldo Roberto Rosa
dc.contributor.referee1.fl_str_mv Gilberto Ribeiro de Queiroz
dc.contributor.referee2.fl_str_mv Irapuan Rodrigues de Oliveira Filho
dc.contributor.author.fl_str_mv Igor Kolesnikov
contributor_str_mv Celso Luiz Mendes
Reinaldo Roberto Rosa
Gilberto Ribeiro de Queiroz
Irapuan Rodrigues de Oliveira Filho
dc.description.abstract.por.fl_txt_mv The parametric computational modeling of galaxies is a process with a high computational cost. The statistical component of modeling, which may involve model refinements in relation to the source brightness distribution achieves more satisfactory results when the approach is Bayesian. In this research, we are using GALaxy PHotometric ATtributes (GALPHAT) as our main tool for data processing. The GALPHAT modeling of a galaxy observed by the Sloan Digital Sky Survey (SDSS) can last about 6 hours. In the current scenario of cosmology, this type of modeling, to be scientifically effective, must be performed on a set containing about thousands of objects. The sample analyzed within the scope of the FAPESP thematic project that LABAC participates contains more than 24,309 objects, an amount that demands the use of high-performance computing (HPC) to enable effective modeling of the entire sample. In this postgraduate project, we have as the main objective to study and optimize HPC solutions that allow GALPHAT processing on a SDSS sample in the fastest possible way. For this, we have two HPC systems that can work in a coordinated way to optimize the modeling strategies. The first system belongs to LABAC and is based on Intel Xeon Phi 7250 platform. The second system belongs to the partition of the multi-core platform of the Santos Dumont supercomputer. The research, therefore, includes the initial process done to set up and run GALPHAT on both platforms, thus using different types of processors and compilers. Considering the different processing steps, in different modeling strategies we applied refactoring and complete modules rewriting. Our studies have found the optimal combination of software, hardware and optimizations to minimize processing time. This is the first step in implementing and integrating the graphical user interface to make GALPHAT easier to use. This dissertation, therefore, presents all of the activities that were performed to allow, as a final result, to process, in a timely manner, via HPC, the entire selected sample including the description of benchmark among the computational systems used. It includes the development of the auxiliary visualization system as well.
A modelagem computacional paramétrica de galáxias é um processo com alto custo computacional. O componente estatístico da modelagem, que pode envolver refinamentos do modelo em relação à distribuição do brilho da fonte, obtém resultados mais satisfatórios quando a abordagem é bayesiana. Nesta pesquisa, estamos usando o GALaxy PHotome-tric ATtributes (GALPHAT) como nossa principal ferramenta para processamento de dados. A modelagem usando o GALPHAT de uma galáxia observada pelo Sloan Digital Sky Survey (SDSS) pode durar cerca de 6 horas. No cenário atual da cosmologia, esse tipo de modelagem, para ser cientificamente eficaz, deve ser realizado em um conjunto contendo milhares de objetos. A amostra analisada dassa pesquisa, que faz parte de projeto temático da FAPESP da qual o LABAC participa contém mais de 24.309 objetos, quantidade que exige o uso do processamento de alto desempenho (PAD) para permitir a modelagem eficaz de toda a amostra. Neste projeto de pós-graduação, temos como principal objetivo estudar e otimizar soluções de PAD que permitem o processamento com GALPHAT em uma amostra de SDSS da maneira mais rápida possível. Para isso, temos dois sistemas PAD que podem funcionar de maneira coordenada para otimizar as estratégias de modelagem. O primeiro sistema pertence ao LABAC e é baseado na plataforma Intel Xeon Phi 7250. O segundo sistema pertence à partição da plataforma multinúcleo do supercomputador Santos Dumont. A pesquisa, portanto, inclui o processo inicial feito para configurar e executar o GALPHAT nas duas plataformas, usando diferentes tipos de processadores e compiladores. Considerando as diferentes etapas de processamento, nas diferentes estratégias de modelagem, aplicamos a refatoração e a reescrita completa dos módulos de pipeline. Nossos estudos descobriram a combinação ideal de software, hardware e otimizações para minimizar o tempo de processamento. Este é o primeiro passo na implementação e integração da interface gráfica do usuário para facilitar o uso do GALPHAT. Esta dissertação, portanto, apresenta todas as atividades realizadas para permitir, como resultado final, processar em tempo hábil, via PAD, toda a amostra selecionada, incluindo a descrição de uma referência entre os sistemas computacionais utilizados. Inclui também o desenvolvimento do sistema de visualização auxiliar.
description The parametric computational modeling of galaxies is a process with a high computational cost. The statistical component of modeling, which may involve model refinements in relation to the source brightness distribution achieves more satisfactory results when the approach is Bayesian. In this research, we are using GALaxy PHotometric ATtributes (GALPHAT) as our main tool for data processing. The GALPHAT modeling of a galaxy observed by the Sloan Digital Sky Survey (SDSS) can last about 6 hours. In the current scenario of cosmology, this type of modeling, to be scientifically effective, must be performed on a set containing about thousands of objects. The sample analyzed within the scope of the FAPESP thematic project that LABAC participates contains more than 24,309 objects, an amount that demands the use of high-performance computing (HPC) to enable effective modeling of the entire sample. In this postgraduate project, we have as the main objective to study and optimize HPC solutions that allow GALPHAT processing on a SDSS sample in the fastest possible way. For this, we have two HPC systems that can work in a coordinated way to optimize the modeling strategies. The first system belongs to LABAC and is based on Intel Xeon Phi 7250 platform. The second system belongs to the partition of the multi-core platform of the Santos Dumont supercomputer. The research, therefore, includes the initial process done to set up and run GALPHAT on both platforms, thus using different types of processors and compilers. Considering the different processing steps, in different modeling strategies we applied refactoring and complete modules rewriting. Our studies have found the optimal combination of software, hardware and optimizations to minimize processing time. This is the first step in implementing and integrating the graphical user interface to make GALPHAT easier to use. This dissertation, therefore, presents all of the activities that were performed to allow, as a final result, to process, in a timely manner, via HPC, the entire selected sample including the description of benchmark among the computational systems used. It includes the development of the auxiliary visualization system as well.
publishDate 2020
dc.date.issued.fl_str_mv 2020-04-23
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
status_str publishedVersion
format masterThesis
dc.identifier.uri.fl_str_mv http://urlib.net/sid.inpe.br/mtc-m21c/2020/04.20.16.10
url http://urlib.net/sid.inpe.br/mtc-m21c/2020/04.20.16.10
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Instituto Nacional de Pesquisas Espaciais (INPE)
dc.publisher.program.fl_str_mv Programa de Pós-Graduação do INPE em Computação Aplicada
dc.publisher.initials.fl_str_mv INPE
dc.publisher.country.fl_str_mv BR
publisher.none.fl_str_mv Instituto Nacional de Pesquisas Espaciais (INPE)
dc.source.none.fl_str_mv reponame:Biblioteca Digital de Teses e Dissertações do INPE
instname:Instituto Nacional de Pesquisas Espaciais (INPE)
instacron:INPE
reponame_str Biblioteca Digital de Teses e Dissertações do INPE
collection Biblioteca Digital de Teses e Dissertações do INPE
instname_str Instituto Nacional de Pesquisas Espaciais (INPE)
instacron_str INPE
institution INPE
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações do INPE - Instituto Nacional de Pesquisas Espaciais (INPE)
repository.mail.fl_str_mv
publisher_program_txtF_mv Programa de Pós-Graduação do INPE em Computação Aplicada
contributor_advisor1_txtF_mv Celso Luiz Mendes
_version_ 1706809363513999360