Accelerating 'fields' by revamping the Cholesky Decomposition / by Vinay B. Ramakrishnaiah, Raghu Raj P. Kumar, John Paige, and Dorit Hammerling
Series: | NCAR Technical NotesBoulder, CO : National Center for Atmospheric Research (NCAR), 2015Content type:- text
- unmediated
- volume
- 2153-2397
- 2153-2400
Item type | Current library | Call number | Copy number | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|---|
REPORT | NCAR Library Mesa Lab | 03720 | 1 | Available | 50583020003871 |
2015-08
Technical Report
The Geophysical Statistics project group within the Institute for Mathematics Applied to Geosciences (IMAGe) has been making use of Matrix Algebra on GPU and Multicore Architectures (MAGMA) to accelerate the Cholesky decomposition. The acceleration is motivated by a) Its frequent use in key computations in the spatial statistics R ‘fields’ package, b) Major bottleneck in ‘fields’ package execution and c) Operations involving big matrices make it suitable for parallelization. The Cholesky Decomposition was accelerated last summer using the MAGMA library. However, the performance of the accelerated version on multiple GPUs was observed to be unconventional - a) Execution time on multiple GPUs was higher in comparison to single GPU execution and b) Deep copy and in-place algorithms had opposite impacts on performance when executed on one and multiple GPUs. Our CPU and GPU profiling, conducted this summer, explains the unconventional behavior observed in the multi-GPU executions. The profiling provided insight to further accelerate the Cholesky Decomposition hierarchically– a) accelerating the underlying C function, b) reducing the function call overhead in R and c) optimizing the R environment. We were able to optimize the code and the environment to get a speedup greater than 75x (single precision) and 65x (double precision) for large matrices. We also found a potential way to improve the MAGMA functions by replacing the communications with direct device-to-device calls.