Approximation Accuracy in Cache Behavior Prediction

Lehre MNM Team Projekte Publikationen en

. Home . Lehre . Studentische Arbeiten . Masterarbeiten . ma-reuse-distance-statistical

Memory transfers are the performance bottleneck of many applications due to poor data locality and limited memory bandwidth. Code refactoring for better data locality can improve cache behavior, leading to significant performance boosts. Reuse distance, a measurure of data locality, is useful in identification and optimization of hot code regions exhibiting poor data locality.

Benefits

Successful completion of the bachelor's thesis enables you to enter responsible positions, for example, in the areas of artificial intelligence, gaming and high-frequency trading. The ability to systematically identify and resolve performance bottlenecks is a highly sought-after skill.

Performance matters! During this thesis you can:

Gain hands-on experience in high-performance computing and computer architecture
Have the opportunity to work with modern state-of-the-art hardware architectures
Deepen your knowledge in the C/C++ programming languages
Get in touch with the Future Computing Group at the Leibniz Supercomputing Centre (LRZ)

Motivation and Background

Reuse distance is defined as the number of unique memory locations referenced between a pair of references to the same memory location. On the granularity of cache lines, reuse distance can model spatial and temporal locality to assess cache behavior of applications. Assuming a fully associative cache with least recently used (LRU) replacement policy, predicting cache behavior with reuse distance is exact.

One of reuse distance's drawbacks is its high computational effort. To address this issue, recently, lightweight methods [1,2,3] combining hardware debug and profiling facilities with statistical approaches [4] have been developed. These methods measure reuse time, the total number of memory locations referenced between a pair of references to the same memory location, instead of reuse distance, and construct reuse distances from statistical considerations based on a Bernoulli process. Their approach, however, is based on the assumption that each memory location is referenced with equal probability. It is questionable whether this assumption holds, especially applied to highly irregular applications, such as sparse matrix-vector multiplication.

Goals and Tasks

In this thesis, you will explore the accuracy of cache behavior prediction with reuse distance in irregular applications. As an example application, you will use sequential and parallel sparse matrix-vector multiplications (SpMV), a ubiquitous kernel, for instance in simulations and graph algorithms.

In the context of this thesis, you will:

Adjust an existing C++ algorithm for reuse distance computation to include cache associativity
Design and conduct experiments with sequential and parallel SpMV on several state-of-the-art hardware architectures of the BEAST system
Compare three cache-behavior prediction approaches with cache-related hardware performance events obtained from SpMV execution:

Exact reuse distance
Statistical methods based on memory instrumentation
Lightweight methods based on statistical profiling
Cachegrind (a Valgrind tool)

Additionally to accuracy of these methods, you will compare the induced runtime overhead.

Prerequisites

Basic understanding and interest in computer architecture and multi-core memory hierarchys
Basic knowledge in high-performance and parallel computing (Parallel and High-Performance Computing or equivalent)
Proficiency in Linux, and C/C++ (Systempraktikum or equivalent)

Starting Point Literature

[1] Wang et al. 2019 Featherlight reuse-distance measurement
[2] Sasongko et al. 2021 ReuseTracker: Fast Yet Accurate Multicore Reuse Distance Analyzer
[3] Sasongko et al. 2023 Precise event sampling‐based data locality tools for AMD multicore architectures
[4] Shen et al. 2006 Accurate approximation of locality from time distance histograms
Beyls and. D’Hollander Reuse Distance as a Metric for Cache Behavior
Hennessy and Patterson Computer Architecture a Quantitative Approach (5th ed.)

Organization

Task setter: Prof. Dr. D. Kranzlmüller
Master's thesis duration: 6 Month
Students: 1
Supervisor: Sergej Breiter
Language: English

Funktionen

Approximation Accuracy in Cache Behavior Prediction

Benefits

Motivation and Background

Goals and Tasks

Prerequisites

Starting Point Literature

Organization