![]() |
![]() |
|
|
Efficient Sparse Matrix-Vector Multiplication on CUDA
Nathan Bell and Michael Garland NVIDIA Technical Report, NVR-2008-004, December 2008, [PDF] [Source Code, etc.] The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In this paper we discuss data structures and algorithms for SpMV that are efficiently implemented on the CUDA platform for the fine-grained parallel architecture of the GPU. Given the memory-bound nature of SpMV, we emphasize memory bandwidth efficiency and compact storage formats. We consider a broad spectrum of sparse matrices, from those that are well-structured and regular to highly irregular matrices with large imbalances in the distribution of nonzeros per matrix row. We develop methods to exploit several common forms of matrix structure while o ering alternatives which accommodate greater irregularity. On structured, grid-based matrices we achieve performance of 36 GFLOP/s in single precision and 16 GFLOP/s in double precision on a GeForce GTX 280 GPU. For unstructured finite-element matrices, we observe performance in excess of 15 GFLOP/s and 10 GFLOP/s in single and double precision respectively. These results compare favorably to prior state-of-the-art studies of SpMV methods on conventional multicore processors. Our double precision SpMV performance is generally two and a half times that of a Cell BE with 8 SPEs and more than ten times greater than that of a quad-core Intel Clovertown system. |
|
|
|
Algebraic Multigrid for Discrete Differential Forms
Nathan Bell PhD Thesis, August 2008, [PDF] Discrete differential forms arise in scientific disciplines ranging from computational electromagnetics to computer graphics. Examples include stable discretizations of the eddy-current problem, topological methods for sensor network coverage, visualization of complex flows, surface parameterization, and the design of vector fields on meshes. In this thesis we describe efficient and scalable numerical solvers for discrete k-form problems. Our approach is based on the principles of algebraic multigrid (AMG) which is designed to solve large-scale linear systems with optimal, or near-optimal efficiency. Since the k-form problems to be solved are arbitrarily large, the need for scalable numerical solvers is clear. |
|
|
|
Algebraic Multigrid for k-form Laplacians Nathan Bell, and Luke Olson Numerical Linear Algebra with Applications, Volume 15, Issue 2-3, Pages 165-185, 2008, [PDF] In this paper we describe an aggregation-based algebraic multigrid method for the solution of discrete k-form Laplacians. Our work generalizes Reitzinger and Schoberl's algorithm to higher dimensional discrete forms. We provide conditions on the tentative prolongators under which the commutativity of the coarse and fine de Rham complexes is maintained. Further, a practical algorithm that satisfies these conditions is outlined and smoothed prolongation operators and the associated finite element spaces are highlighted. Numerical evidence of the efficiency and generality of the proposed method is presented in the context of discrete Hodge decompositions. |
|
|
|
A Fast Multigrid Algorithm for Mesh Deformation. Lin Shi, Yizhou Yu, Nathan Bell and Wei-Wen Feng. SIGGRAPH 2006 (ACM Transactions on Graphics, Vol. 24, No. 3, 2006), [PDF] [Video] In this paper, we present a multigrid technique for efficiently deforming large surface and volume meshes. We show that a previous least-squares formulation for distortion minimization reduces to a Laplacian system on a general graph structure for which we derive an analytic expression. We then describe an efficient multigrid algorithm for solving the relevant equations. Here we develop novel prolongation and restriction operators used in the multigrid cycles. Combined with a simple but effective graph coarsening strategy, our multigrid algorithm outperforms other direct and multigrid solvers in both time and memory cost. While direct factorization methods have frequently been applied to related problems, it is demonstrated that even for modestly sized meshes, our multigrid solver surpasses the most sophisticated factorization codes. Moreover, since our multigrid solver does not rely on extensive precomputation, it is particularly well suited for integration with a general mesh editing environment where unpredictable combinations of different operations will invalidate the results of any such precomputation. Experimental evidence of these advantages is provided on a number of meshes with a wide range of size. |
|
|
|
Particle-Based Simulation of Granular Materials Nathan Bell, Yizhou Yu, and Peter J. Mucha ACM SIGGRAPH/ Eurographics Symposium on Computer Animation 2005, [PDF] [Video] Granular materials, such as sand and grains, are ubiquitous. Simulating the 3D dynamic motion of such materials represents a challenging problem in graphics because of their unique physical properties. In this paper we present a simple and effective method for granular material simulation. By incorporating techniques from physical models, our approach describes granular phenomena more faithfully than previous methods. Granular material is represented by a large collection of non-spherical particles which may be in persistent contact. The particles represent discrete elements of the simulated material. One major advantage of using discrete elements is that the topology of particle interaction can evolve freely. As a result, highly dynamic phenomena, such as splashing and avalanches, can be conveniently generated by this meshless approach without sacrificing physical accuracy. We generalize this discrete model to rigid bodies by distributing particles over their surfaces. In this way, two-way coupling between granular materials and rigid bodies is achieved. |
|
Thrust - Code at the Speed of Light Thrust is a CUDA library of parallel algorithms with an interface resembling the C++ Standard Template Library (STL). Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity. Develop high-performance applications rapidly with Thrust! For additional information, refer to the project website. |
|
|
PyAMG - Algebraic Multigrid Solvers in Python PyAMG is a library of Algebraic Multigrid (AMG) solvers with a convenient Python interface. PyAMG features implementations of several popular AMG methods including Classical (Ruge-Stuben) AMG, AMG based on Smoothed Aggregation, and Adaptive Smoothed Aggregation (αSA). The predominant portion of PyAMG is written in Python and leverages SciPy. A smaller amount of supporting C++ code is used for performance critical operations. Refer to the PyAMG website for more information. |
|
|
PyDEC - A Discrete Exterior Calculus Framework Soon to be released here. |
|
|
SciPy - Scientific Tools for Python I'm a contributing member of the SciPy project. My primary interest is in sparse linear algebra. |
|
|
Algebraic Multigrid for Discrete Laplacians Nathan Bell and Luke Olson Copper Mountain Conference on Multigrid Methods 2007, [PDF] We describe a method for coarsening chain complexes arising from discrete differential forms. The coarse complexes are shown to retain essential properties of the original complex while being smaller in size. A hierarchy of coarse complexes provides piecewise-constant interpolation operators which can be used in a Smoothed Aggregation multigrid method for efficient solution of discrete Laplacians. |