Logo MTL4
Introduction

Many things can be realized on a computer very elegantly and efficiently today thanks to progress in software and programming languages. One thing that cannot be done elegantly on a computer is computing. At least not computing fast.

High performance computing (HPC) is to a large extend influenced by some highly tuned numeric libraries. Assume we want to multiply two matrices, i.e. calculate A = B * C. Then we can use some libraries that run at over 90 per cent peak performance. We only need to write something like:

        int m= num_rows(A), n= num_cols(B), k= num_cols(A), 
            lda= A.get_ldim(), ldb= B.get_ldim(), ldc= C.get_ldim();
        double alpha= 1.0, beta= 1.0;
        char a_trans= 'N', b_trans= 'N';
        _dgemm(&a_trans, &b_trans, &m, &n, &k, &alpha, &A[0][0], &lda, 
               &B[0][0], &ldb, &beta, &C[0][0], &ldc);

No doubt, next time we call dgemm we instantly remember the exact order of the 13 arguments. Certainly, calling the C-BLAS interface looks somewhat nicer and we can write functions that deal with the dimensions and the orientation, like dgemm(A, B, C). We can furthermore write a polymorphic function gemm that accordingly calls _sgemm, _dgemm and so on. Indead, there is a project working on this. But is this all we want? Why not writing A = B * C; and the library calls the according BLAS function? What do we want to do if there is none?

Programmers working with BLAS libraries are forced to limit themselves to the operations and types provided by these packages. As an example, if one likes to use single-precision floats for preconditioner matrices--to save memory bandwidth--while the vectors are double-valued, one cannot use regular BLAS libraries. In contrast, any generic library that contains a matrix vector product can perform this operation.

And what if somebody wants to build matrices and vectors of quaternions or intervals? Or rationals? How to calculate on them? Again, this is no problem with a generic library but it would take enormous implementation efforts in Fortran or C (even more in an assembly language to squeaze out the last nano-second of run-time (on each platform respectively)).

Mathematica and Matlab are by far more elegant than C or Fortran libraries. And as long as one uses standard operations as matrix products they are fast since they can use the tuned libraries. As soon as you start programming your own computations looping over elements of the matrices or vectors your performance won't be impressive, to say the least.

MTL4 allows you to write A = B * C and let you use BLAS internally if available. Otherwise it provides you an implementation in C++ that is also reasonably fast (we usually reached 60 per cent peak).

All this said, dense matrix multiplication is certainly the most benchmarked operation on high performance computers but not really the operation that high performance computers use the most in real applications. The dominant part of scientific computing in HPC are simulations that are mostly handled with finite element methods (FEM), finite volume methods (FVM), finite difference methods (FDM), or alike. The numeric problems that arise from these methods are almost ever linear or non-linear systems of equations in terms of very large sparse matrices and dense vectors.

In contrast to most other libraries we paid strong attention to sparse matrices and their operations. To start with, we developed an efficient method to fill the matrices and compress them in-place, cf. Matrix Insertion. This allows for matrix sizes that are close to the memory size. It is also possible to change the compressed matrices later.

The product of sparse matrices with dense ones allows you to multiply a sparse matrix simultaneously with multiple vectors. Besides cache reuse regarding the sparse matrix simple and efficient loop unrolling could be applied. (Performance plots still pending ;-) )

Sparse matrices can be multiplied very fast with MTL4. In the typical case that the number of non-zeros per row and per column is limited by a constant for any dimension, the run-time of the multiplication is linear in the number of rows or columns. (Remark: we did not use the condition that the number of non-zeros in the matrix is proportional to the dimension. This condition includes the pathological case that the first matrix contains a column vector of non-zeros and the second one a row vector of non-zeros. Then the complexity would be quadratic.) Such matrices usually originate from FEM/FDM/FVM discrezations of PDEs on continous domains. Then the number of rows and columns corresponds to the number of nodes or cells in the discretized domain. Sparse matrix products can be very useful in algebraic multigrid methods (AMG).

Returning to the expression A = B * C; it can be used to express every product of sparse and dense matrices. The library will dispatch to the appropriate algorithm. Moreover, the expression could also represent a matrix vector product if A and C are column vectors (one would probably choose lower-case names though). In fact, x = y * z can represent four different operations:

There is much more to say about MTL. Some of it you will find in the Tutorial, some of it still needs to be written.

Proceed to the installation guide.


Introduction -- MTL 4 -- Peter Gottschling and Andrew Lumsdaine -- Gen. with rev. 7542 on Sat Aug 11 2012 by doxygen 1.7.6.1 -- © 2010 by SimuNova UG.