Normal view MARC view ISBD view

Incorporating MAGMA into the `fields' spatial statistics package / by John Paige, Isaac Lyngaas, Vinay Ramakrishnaiah, Dorit Hammerling, Raghu Kumar, and Douglas Nychka

By:

National Center for Atmospheric Research (NCAR)

Contributor(s):

National Center for Atmospheric Research (U.S.)

Series: | NCAR Technical NotesBoulder, CO : National Center for Atmospheric Research (NCAR), 2015Content type:

text

Media type:

unmediated

Carrier type:

volume

ISSN:

2153-2397
2153-2400

Subject(s):

Online resources:

Click here to access online

Abstract: In this report we describe how to incorporate the Cholesky decomposition from the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library into some of the calculations of the `fields' spatial statistics package in R. We provide MAGMA installation instructions as well as demonstrations of performance when applied to simulated datasets and the CO2 dataset available in fields. While there are other spatial statistics packages in R using parallelism, such as bigGP and parspatstat, none to our knowledge directly incorporates GPUs or other coprocessors. Our code is timed on Caldera computational nodes in the National Center for Atmospheric Research's Yellowstone supercomputing environment. We find that for 40,000 x 40,000 matrices the MAGMA-accelerated decomposition has a 30.7 and 46.2 times speedup for 1 and 2 GPU implementations respectively over chol, the standard Cholesky decomposition function in R (with settings allowing R programmers to use our accelerated function like they would chol). The speedups are greater when using in-place calculations where the original matrix is overwritten and not copied. In that case, the equivalent speedups are 41.8 and 54.4 times for in place decompositions on one and two GPUs respectively. We also time a simple spatial analysis workflow with maximum likelihood estimation with up to over 23,000 observations, where accelerated workflows achieved approximately 4.2 and 4.3 times speedup when using 1 and 2 GPUs respectively over a corresponding unaccelerated workflow. As problem size increases, speedups improve, and the 2 GPU decompositions perform increasingly well compared to their corresponding 1 GPU implementations. Performance for 2 GPU decompositions is slower than with 1 GPU in some cases due to additional communication overheads and data dependencies in the Cholesky decomposition algorithm, and will be explored further in Ramakrishnaiah et al. (2015).

Holdings ( 1 )
Title notes ( 3 )

Holdings
Item type	Current library	Call number	Copy number	Status	Date due	Barcode	Item holds
REPORT	NCAR Library Mesa Lab	03721	1	Available		50583020003889

Total holds: 0

2015-08

Technical Report

In this report we describe how to incorporate the Cholesky decomposition from the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library into some of the calculations of the `fields' spatial statistics package in R. We provide MAGMA installation instructions as well as demonstrations of performance when applied to simulated datasets and the CO2 dataset available in fields. While there are other spatial statistics packages in R using parallelism, such as bigGP and parspatstat, none to our knowledge directly incorporates GPUs or other coprocessors. Our code is timed on Caldera computational nodes in the National Center for Atmospheric Research's Yellowstone supercomputing environment. We find that for 40,000 x 40,000 matrices the MAGMA-accelerated decomposition has a 30.7 and 46.2 times speedup for 1 and 2 GPU implementations respectively over chol, the standard Cholesky decomposition function in R (with settings allowing R programmers to use our accelerated function like they would chol). The speedups are greater when using in-place calculations where the original matrix is overwritten and not copied. In that case, the equivalent speedups are 41.8 and 54.4 times for in place decompositions on one and two GPUs respectively. We also time a simple spatial analysis workflow with maximum likelihood estimation with up to over 23,000 observations, where accelerated workflows achieved approximately 4.2 and 4.3 times speedup when using 1 and 2 GPUs respectively over a corresponding unaccelerated workflow. As problem size increases, speedups improve, and the 2 GPU decompositions perform increasingly well compared to their corresponding 1 GPU implementations. Performance for 2 GPU decompositions is slower than with 1 GPU in some cases due to additional communication overheads and data dependencies in the Cholesky decomposition algorithm, and will be explored further in Ramakrishnaiah et al. (2015).

Place hold
Print
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com)