Content

Solving Systems of Linear Equations

In this example we will solve a system of linear equations \(Ax=b\). For this purpose we first compute the \(LU\) factorization of \(A\) with lapack::trf and then use the triangular solver blas::sm to finish the job. In a later example we show how to call the LAPACK driver function lapack::sv which does internnally what you see here.

Example Code

#include <iostream>
#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef double   T;

int
main()
{
    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;

    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;
}

Comments on Example Code

With header flens.cxx all of FLENS gets included.

#include <flens/flens.cxx>

Define some convenient typedefs for the matrix/vector types of our system of linear equations.

    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;

We also need an extra vector type for the pivots. The type of the pivots is taken for the system matrix.

    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

Define an underscore operator for convenient matrix slicing

    const Underscore<IndexType> _;

Set up the baby problem ...

    const IndexType m = 4,
                    n = 5;

Compute the \(LU\) factorization with lapack::trf

    lapack::trf(Ab, piv);

Solve the system of linear equation \(Ax =B\) using blas::sm

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

Compile

$shell> cd flens/examples                                                       
$shell> g++ -std=c++11 -Wall -I../.. -o lapack-getrf lapack-getrf.cc            

Run

$shell> cd flens/examples                                                       
$shell> ./lapack-getrf                                                          
Ab = 
            2             3            -1             0            20 
           -6            -5             0             2           -33 
            2            -5             6            -6           -43 
            4             6             2            -3            49 
X = 
            1 
            9 
            9 
            9 

Using QD (Double-Double and Quad-Double Package)

The following description is taken from the QD Library page:

The QD Library supports both a double-double datatype (approx. 32 decimal digits) and a quad-double datatype (approx. 64 decimal digits). The computational library is written in C++. Both C++ and Fortran-90 high-level language interfaces are provided to permit one to use convert an existing C++ or Fortran-90 program to use the library with only minor changes to the source code. In most cases only a few type statements and (for Fortran-90 programs) read/write statements need to be changed. PSLQ and numerical quadrature programs are included.

Modified Example Code

#include <iostream>

#include <qd/qd_real.h>
#include <qd/fpu.h>

#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef qd_real   T;

int
main()
{
    unsigned int old_cw;
    fpu_fix_start(&old_cw);

    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;
    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;

    fpu_fix_end(&old_cw);
}

Comments on Modifications

Only a few modifications were made which will be pointed out in the following:

Include the qd-headers

#include <qd/qd_real.h>
#include <qd/fpu.h>

For using quad-double precision in this example change the typedef to

typedef qd_real   T;

For understanding the next code lines we first take a look at the QD Library documentation:

The algorithms in the QD library assume IEEE double precision floating point arithmetic. Since Intel x86 processors have extended (80-bit) floating point registers, the round-to-double flag must be enabled in the control word of the FPU for this library to function properly under x86 processors. The function fpu_fix_start turns on the round-to-double bit in the FPU control word, while fpu_fix_end will restore the original state.

So the first thing we do in main is turning on the correct rounding ...

int
main()
{
    unsigned int old_cw;
    fpu_fix_start(&old_cw);

... and at the end restore FPU rounding behavior as mentioned above.

    fpu_fix_end(&old_cw);
}

Compile and Run

$shell> cd flens/examples                                                        
$shell> g++ -Wall -std=c++11 -I../.. -I/opt/local/include/ -c qd-lapack-getrf.cc 
$shell> g++ -o qd-lapack-getrf qd-lapack-getrf.o /opt/local/lib/libqd.a          
$shell> ./qd-lapack-getrf                                                        
Ab = 
 2.000000e+00  3.000000e+00 -1.000000e+00  0.000000e+00  2.000000e+01 
-6.000000e+00 -5.000000e+00  0.000000e+00  2.000000e+00 -3.300000e+01 
 2.000000e+00 -5.000000e+00  6.000000e+00 -6.000000e+00 -4.300000e+01 
 4.000000e+00  6.000000e+00  2.000000e+00 -3.000000e+00  4.900000e+01 
X = 
 1.000000e+00 
 9.000000e+00 
 9.000000e+00 
 9.000000e+00 

Using mpfr::real (C++ Interface for mpfr)

The following description is taken from the mpfr::real page:

The class mpfr::real is a high-level C++ wrapper for the GNU MPFR library, a C library for multiple-precision floating-point computations with correct rounding.

The objects of mpfr::real can (almost) be used like doubles or other fundamental floating-point types, thus avoiding the use of C functions to manipulate MPFR's low-level data type directly. In addition to all arithmetic and comparison operators available for fundamental floating-point types, mpfr::real also supports the set of mathematical functions for double from math.h/cmath. Finally, std::istream operator >> and std::ostream operator << are implemented for mpfr::reals. This allows to substitute double with mpfr::real with no further modifications of the code in many cases.

Modified Example Code

#include <iostream>

#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>

using namespace std;
using namespace flens;

typedef mpfr::real<53>   T;

int
main()
{
    typedef GeMatrix<FullStorage<T> >           Matrix;
    typedef DenseVector<Array<T> >              Vector;
    typedef Matrix::IndexType                   IndexType;
    typedef DenseVector<Array<IndexType> >      IndexVector;

    const Underscore<IndexType> _;

    const IndexType m = 4,
                    n = 5;

    Matrix            Ab(m, n);
    IndexVector       piv(m);

    Ab =  2,   3,  -1,   0,  20,
         -6,  -5,   0,   2, -33,
          2,  -5,   6,  -6, -43,
          4,   6,   2,  -3,  49;

    cout << "Ab = " << Ab << endl;

    lapack::trf(Ab, piv);

    const auto A = Ab(_,_(1,m));
    auto       B = Ab(_,_(m+1,n));

    blas::sm(Left, NoTrans, T(1), A.upper(), B);

    cout << "X = " << B << endl;
}

Comments on Modifications

Include header file for mpfr::real before flens.cxx and make sure that conversion operators are enabled

#define REAL_ENABLE_CONVERSION_OPERATORS
#include <external/real.hpp>
#include <flens/flens.cxx>

Make typedef for using mpfr::real

typedef mpfr::real<53>   T;

Compile and Run

$shell> cd flens/examples                                                       
$shell> g++ -Wall -std=c++11 -I../.. -I/opt/local/include -o mpfr-real-lapack-getrf mpfr-real-lapack-getrf.cc -L /opt/local/lib -lmpfr -lgmp                                      
$shell> ./mpfr-real-lapack-getrf                                                
Ab = 
 2.000000e+00  3.000000e+00 -1.000000e+00  0.000000e+00  2.000000e+01 
-6.000000e+00 -5.000000e+00  0.000000e+00  2.000000e+00 -3.300000e+01 
 2.000000e+00 -5.000000e+00  6.000000e+00 -6.000000e+00 -4.300000e+01 
 4.000000e+00  6.000000e+00  2.000000e+00 -3.000000e+00  4.900000e+01 
X = 
 1.000000e+00 
 9.000000e+00 
 9.000000e+00 
 9.000000e+00 

For Performance: Link with ATLAS

For high performance you have to link with an optimized BLAS implementation like GotoBLAS or ATLAS. For ATLAS pass -DWITH_ATLAS to the compiler:

$shell> cd flens/examples                                                       
$shell> g++ -DWITH_ATLAS -Wall -std=c++11 -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas               
$shell> ./atlas-lapack-getrf                                                    
Ab = 
            2             3            -1             0            20 
           -6            -5             0             2           -33 
            2            -5             6            -6           -43 
            4             6             2            -3            49 
X = 
            1 
            9 
            9 
            9 

If you want to check which BLAS function gets called also pass -DDEBUG_CXXBLAS to the compiler and grep and sort for CXXBLAS (note that the problem size in this example is so small that no LEVEL 3 BLAS function gets used):

$shell> cd flens/examples                                                       
$shell> g++ -DWITH_ATLAS -DCXXBLAS_DEBUG -Wall -std=c++11 -I../.. -I/opt/local/include -o atlas-lapack-getrf lapack-getrf.cc -latlas -lcblas                                                 
$shell> ./atlas-lapack-getrf 2>&1 | grep CXXBLAS | sort -u                      
CXXBLAS: [ATLAS] cblas_dger
CXXBLAS: [ATLAS] cblas_dscal
CXXBLAS: [ATLAS] cblas_dswap
CXXBLAS: [ATLAS] cblas_dtrsm
CXXBLAS: [ATLAS] cblas_idamax

For optimal performance also pass -DNDEBUG and -O3.