BeBOP Optimized Sparse Kernel Interface Library  1.0.1h
Data Structures | Typedefs
format.h File Reference

Modified block compressed sparse row data structure. More...

#include <oski/common.h>
#include <oski/matrix.h>
#include <oski/BCSR/format.h>

Go to the source code of this file.

Data Structures

struct  tagOski_submatMBCSR_t
 Stores an MBCSR "submatrix". More...
struct  tagBebop_matMBCSR_t
 Modified block compressed sparse row (MBCSR) format. More...

Defines

#define INC_OSKI_MBCSR_FORMAT_H
 oski/MBCSR/format.h included.
Name mangling.
#define oski_submatMBCSR_t   MANGLE_(oski_submatMBCSR_t)
 MBCSR matrix representation.
#define oski_matMBCSR_t   MANGLE_(oski_matMBCSR_t)
 MBCSR matrix representation.
#define oski_MBCSR_MatMult_funcpt   MANGLE_(oski_MBCSR_MatMult_funcpt)
 MBCSR matrix representation.

Typedefs

typedef struct
tagOski_submatMBCSR_t 
oski_submatMBCSR_t
 Stores an MBCSR "submatrix".
typedef struct tagBebop_matMBCSR_t oski_matMBCSR_t
 Modified block compressed sparse row (MBCSR) format.

Detailed Description

Modified block compressed sparse row data structure.

For an overview, see Modified Block Compressed Sparse Row (MBCSR) Format.


Typedef Documentation

Modified block compressed sparse row (MBCSR) format.

An $m\times n$ matrix $A$ stored in $r\times c$ MBCSR format is logically partitioned row-wise into 3 submatrices, $A = \left(\begin{array} {}A_1 \\ A_2 \\ A_3\end{array}\right)$, where $A$ is stored in $r\times c$ MBCSR format. This partitioning is a `canonical' format in which $A_1$, $A_2$, and $A_3$ have the following properties:

  • $A_1$ consists of all block rows and diagonal blocks of size $r\times c$ and $r\times r$, respectively.
  • $A_2$ contains the degenerate diagonal block of size $r'\times r'$ where $r' < r$, and the corresponding block row of size $r'$ that contains it.
  • $A_3$ contains no diagonal blocks (i.e., is simply stored in BCSR format).

The purpose of this format is to enable fast implementations of the triangular solve and $A^TA\cdot x$ kernels in which the `special case' code to handle degenerate diagonal block, if any, is stored separately in $A_2$, and multiplication by $A_1, A_2,$ and $A_3$ can execute at `full' speed. The canonical form provides a uniform way to treat both square and rectangular matrices stored in MBCSR format.

Stores an MBCSR "submatrix".

An $m\times n$ matrix $B$ stored in $r\times c$ MBCSR submatrix format must satisfy the following conditions:

  • $r$ divides $m$
  • $m \leq n$

Let $M = \frac{m}{r}$ be the number of block rows. Then $B$ is stored in 4 arrays, $(P, J, V, D)$, as follows:

  • $D$ is an array of length $Mr^2$ containing all the elements of $B$ that lie within the $M$ diagonal blocks beginning at element $B(1, \delta)$ where each block is a square $r\times r$ block. The blocks are stored consecutively, and each block is stored in row-major order. More precisely, for all $0 \leq I_0 < M$, $0 \leq i < r$, and $0 \leq j < r$, the array element $D[I_0\cdot r^2 + i\cdot r + j]$ stores the matrix element $B(1+I_0\cdot r + i, \delta+I_0\cdot r + j)$.
  • $P$ is an integer array of block-row pointers, of length at least $M+1$.
  • $J$ is an integer array of block column indices, of length at least $P[M]$.
  • $V$ is an array of non-zero block values, of length at least $P[M]\cdot r\cdot c$.
  • Define block-row $I$, where $0 \leq I < M$, as rows $1+(I-1)\cdot r$ through $I\cdot r$ of $B$. Then For each $k$ such that $P[I-1] \leq k \leq P[I]$, $j_0=J[k]+1$ is the column index of the $(1,1)$ entry of an explicitly stored non-zero block whose values are laid out in row-major order in the subarray $V[krc : (k+1)rc-1]$.
  • If any block in $V$ overlaps with a diagonal block in $D$, then the corresponding entries in $V$ are set to 0.
  • No block in $V$ is defined so that it extends beyond column $n$ of $B$. That is, if $c$ does not divide $n$, then each block row may contain one block with $j_0 = n - c + 1$. If such a block overlaps with another block starting at column index $j_0'$, then the initial columns of the block at $j_0$ are set to zero.