An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor.
=======================================================================

We present an implementation of the Algorithm of Multiple Relatively Robust Representations (MRRR) for the symmetric tridiagonal eigenvalue problem on a data-parallel coprocessor using the CUDA programming environment. Our results
demonstrate that the algorithm maps well onto a data-parallel architecture and we achieve up to 50-fold speedups over LAPACK’s MRRR implementation.
Our accuracy lacks currently behind those of LAPACK but we believe that these problems can be overcome in the near future without significantly affecting performance.

This file contains some information on the source code release of our implementation. A detailed description of the implementation is available online under www.dgp.toronto.edu/people/lessig/mrrr.

Christian Lessig, University of Toronto
lessig (at) dgp (dot) toronto (edu)

LICENSE
=======

The source code is release under the GPL3 licence. See LICENSE for details.

NOTES
=====

Our implementation uses the infrastructure of the CUDA SDK. Please extract the source code files to a directory in the CUDA SDK projects/ folder.

WINDOWS
-------

There is currently no support for Windows due to (still) inappropriate template support in Visual Studio. The necessary changes are straight-forward but would complicate the source code considerably.
Please send me an email if you are interested in a Windows version.

LINUX
-----

Some minor modifications to common/common.mk are necessary if CLAPACK is to be used. Otherwise use the make flag noclapack=1.

diff new/common.mk original/common.mk 
46d45
< LIB                            ?=
153c152
< LIB    += -L$(CUDA_INSTALL_PATH)/lib -L$(LIBDIR) -L$(COMMONDIR)/lib
---
> LIB       := -L$(CUDA_INSTALL_PATH)/lib -L$(LIBDIR) -L$(COMMONDIR)/lib
155c154
<    LIB += -lcuda -lcudart ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) 
---
>    LIB += -lcuda -lcudart ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) ${LIB} 
157c156
<    LIB += -lcudart ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) 
---
>    LIB += -lcudart ${OPENGLLIB} $(PARAMGLLIB) $(CUDPPLIB) ${LIB}
213c212


GENERAL
-------

Compile Flags
'''''''''''''

Name        Type    Default   Description

noclapack    int     0         If nolapack=1 then no CLapack libs are required
                               and no CPU solution is computed when the program 
                               is executed.

Under Linux also the standard make flags of the CUDA SDK are avaialbe, please refer to the CUDA SDK for details.

Command Line Arguments
''''''''''''''''''''''

Name                  Type    Default      Description

output-path           str     ./testing    Path where all result files are
                                           stored.

max-evals-interval    int     config.h*    Maximum number of eigenvalues in a
                                           sub-interval of the Gerschgorin
                                           interval after splitting.

precision-classify    float   0.0001       Precision for classifying
                                           eigenvalues as singletons or
                                           clusters. (Should be larger or
                                           equal to precision-refine)

precision-refine      float   0.000001     Refinement precision for eigenvalue 
                                           approximation.

matrix-type           str     Rand         Type of input matrix. Currently
                                           supported are {Rand , Wilkinson , 
                                           1-2-1 , Clement}

bisect-multi-splits   int     config.h*    Number of splits of the input
                                           interval at the initial bisection
                                           step for each multiprocessor.
                                           (Should not exceed 
                                            max-evals-interval.)

min-intervals         int     1            Minimum number of intervals into
                                           which the Gerschgorin interval 
                                           *should* be split (there is no
                                           guarantee that a request is actually 
                                           satisfied). Usually, the minimum
                                           number of interval shoul equal the
                                           number of multiprocessors in the
                                           Compute chip employed.

iterations            int     100***       Number of iterations for timings.

matrix-size           int     256          Size of the test matrix.

num-experiments       int     1            Number of independent experiments 
                                           (e.g. when experiments with random
                                            matrices are performed, it is often
                                            useful to perform multiple 
                                            independent experiments and average
                                            over the result)

seed                  uint   time(NULL)    Seed for random number generator for 
                                           random matrices.

*    This value is specified in inc/config.h
***  This value depends on the matrix size and if host or device emulation mode
     is used.

Example:

../../bin/linux/release/mrrr --matrix-size=1024 --min-intervals=8
                                
KNOWN PROBLEMS
==============

For matrices with strong eigenvalue clustering the eigenvectors are currently not always orthogonal.


