PPAM'03 - Parallel Processing and Applied Mathematics

Analytical Modeling of Optimized Sparse Linear Code
Authors: Ivan Šimeček and Pavel Tvrdík

High-performance,  analytical cache model, sparse matrix-vector multiplication, cache locality, Conjugate Gradient algorithm.


In this paper, we describe source code transformations based on sw-pipelining, loop unrolling, and loop fusion for the sparse matrix-vector multiplication and for the Conjugate Gradient algorithm that enable data prefetching and overlapping of load and FPU arithmetic instructions and improve the temporal cache locality. We develop a probabilistic model for estimation of the numbers of cache misses for 3 types of data caches: direct mapped and s-way set associative with random and with
LRU replacement strategies. Using HW cache monitoring tools, we compare the predicted number of cache misses with real numbers on Intel x86 architecture with L1 and L2 caches. The accuracy of our analytical model is around 97%. The errors in estimations are due to minor simplifying assumptions in our model.

BibTex entry:
author = {P. Tvrd\'{\i}k and I. \v{S}ime\v{c}ek},
title = {Analytical modeling of optimized sparse linear code},
journal = {PPAM},
series = {Lecture Notes of Computer Science},
volume = {12},
number= {4},
pages = {617--629},
year = {2003},
url = {http://www.springerlink.com/content/drwdhen7db199k05/},
Address = {Czestochova, Poland}