Sparse matrix-vector multiplication (shortly SpMV) is one of most common subroutines in the numerical linear algebra. The problem is that the memory access patterns during the SpMV are irregular and the utilization of cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpMV are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of a reorganization of a matrix from one format to another one is often of the order of tens of executions of a SpMV. For that reason, such a reorganization pays off only if the same matrix A is multiplied with multiple different vectors, e.g., in iterative linear solvers. This paper introduces new approach for the acceleration the SpMV. This approach consists of 3 steps: 1) dividing matrix A into non-empty regions, 2) choosing an efficient way to traverse these regions (in another words choosing an efficient ordering of partial multiplications), 3) choosing the optimal type of storage for each region. All these 3 steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) those can fit in the cache. The second step improves locality during the multiplication due better utilization of distant references. The last step maximizes machine computation performance of the partial multiplication for each region. In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements proved that our approach gives a significant speedup for almost all matrices arising from various technical areas.