Improving of the performance of sparse
matrix-vector multiplication on the modern architectures
Author: Ivan Šimeček
Sparse matrix-vector multiplication, high-performance,
cache locality, sw-pipelining, loop unrolling, loop fusion.
In this paper, we describe source code transformations based on
sw-pipelining, loop unrolling, and loop fusion for the sparse
matrix-vector multiplication that enable data prefetching and
overlapping of load and FPU arithmetic instructions and improve the
temporal cache locality. The paper represents evaluation of results of
these optimizations on various HW platforms.
final version (in .PDF format)
author = "I.
title = "Improving of
the performance of sparse matrix-vector multiplication on the modern
journal = "CTU FEE POSTER",
volume = "9",
pages = "182-183",
month = mar,
year = "2005",
Address = "Prague, Czech Republic"