Architecture Dependent Linear Code Optimizations
Authors: Ivan Šimeček and Pavel Tvrdík

High-performance,  processor architecture, cache hierarchy, Cholesky factorization, Conjugate Gradient algorithm, dynamic loop reversal.


Every mathematical algorithm for the numerical algebra can be relatively easily rewrited into pseudocode: every mathematic operation in the algorithm is simply transformed into one subroutine from the linear algebra package. This simple algorithm leads to very transparent, error-free codes. But these codes do not respect the inner architecture of the CPU and the memory hierarchy and suffer from low temporal and spatial locality. These drawbacks can be reduced by the applications of SW transformation techniques. These transformations result in better temporal and spatial locality or in more efficient utilization of inner pipelines.

In this paper, we will demonstrate idea of SW transformation on 2 codes of iterative solvers. We will also represent the new code restructuring transformation called dynamic loop reversal. This transformation offers a new way to improve the cache utilization.


final version (in .DOC format)

BibTex entry:
  author =       "P. Tvrd\'{\i}k and I. \v{S}ime\v{c}ek",
  title =        "Architecture Dependent Linear Code Optimizations",
  journal =      "CTU Workshop",
  volume =       "9",
  pages =        "178-179",
  month =        mar,
  year =         "2005",
  isbn =         "80-01-03201-9",
  Address =      "Prague, Czech Republic"