For a good performance
of every computer program, the efficient cache utilization is crucial.
In numerical algebra libraries (such as BLAS or LAPACK) is the good
cache utilization achieved by the explicit loop restructuring. It
includes loop unrolling-and-jam which increase the FPU pipeline
utilization in the innermost loop, loop blocking (that is why we called
these codes shortly blocked) and loop interchange to maximize the a
cache hit ratio. After application of these transformations, these
codes are divided into two parts. Outer loops are \"out-cache\", inner
loops are \"in-cache\". Codes have almost the same performance
independently on the amount of data, but all these code transformations
require the difficult cache behavior analysis. In this paper, we
represent the recursive implementation of some routines from the
numerical algebra library. This implementation leads to cache-sensitive
codes due to the \"natural\" partition of data without need to analyze
the cache behavior.