Parallel self-verified solver for dense linear systems
Ano de defesa: | 2009 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Pontifícia Universidade Católica do Rio Grande do Sul
Porto Alegre |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/10923/1600 |
Resumo: | This thesis presents a free, fast, reliable and accurate solver for point and interval dense linear systems. The idea was to implement a solver for dense linear systems using a verified method, interval arithmetic and directed roundings based on MPI communication primitives associated to optimized libraries, aiming to provide both self-verification and speed-up at the same time. A first parallel implementation was developed using the C-XSC library. However, the CXSC parallel method did not achieve the expected overall performance since the solver was not 100% parallelized due to its implementation properties (special variables and optimal scalar product). C-XSC did not seem to be the most efficient tool for time critical applications, consequently we proposed and implemented a new sequential verified solver for dense linear systems for point and interval input data using both infimum-supremum and midpoint-radius arithmetic based on highly optimized libraries (BLAS/ LAPACK). Performance tests showed that the midpointradius algorithm needs approximately the same time to solve a linear system with point or interval input data, while the infimum-supremum algorithm needs much more time for interval data. Considering that, midpoint-radius arithmetic was the natural choice for the next step of this work: the parallel implementation. We then developed a new parallel verified solver for point and interval dense linear systems using midpoint-radius arithmetic, directed roundings and optimized libraries (PBLAS/ ScaLAPACK). The performance results showed that it was possible to achieve very good speed-ups in a wide range of processor numbers for large matrix dimensions for both point and interval input data. In order to overcome the memory limitation imposed by the generation of the whole matrix in one processor, we decided to generate sub-matrices of the input matrix individually on each available node, allowing a better use of the global memory. These modifications made it possible to solve dense systems with up to 100 000 dimension. In addition to that, in order to investigate the portability of the proposed solution, during this thesis, tests were performed using 3 different clusters in Germany (ALiCEnext, XC1 and IC1) with distinct configurations presenting significant results, indicating that the parallel solver scales well even for very large dense systems over many processors. Further investigations were done in two directions: study of the use of dedicated threads to speed up the solver of dense linear systems on shared memory, specially dual-core processors and the use of the ideas presented in this thesis to speed-up the C-XSC library. |