PhD Proposal
SOFTWARE-ORIENTED DISTRIBUTED SHARED CACHE MANAGEMENT FOR CHIP MULTIPROCESSORS
Lei Jin
Friday February 20, 2009
2:30 pm - TBD
Abstract
In this work, we propose a new software-oriented approach for managing the performance of programs on distributed shared L2 caches of a tile-based chip multiprocessor (CMP). The conventional hardware-oriented shared cache scheme loses performance due to the blind distribution of data predominantly accessed by a single thread. The rigorous data attraction of private cache scheme, on the other hand, leads to excessive capacity misses. Our proposed software-oriented approach infers data affinity hints from programs through on-line or off-line analysis of L2 cache access traces. The OS utilizes the hints to guide proper data placement in the L2 cache through page coloring. By off-loading the management task onto software, our proposed scheme deviates substantially from previously proposed hardware-based cache management approaches and opens up new opportunities for CMP cache optimization. The flexibility of software analysis allows us to perform different optimization strategies for different types of programs. For single-threaded programs, our scheme exploits the trade-off between cache miss rate and cache access latency on shared distributed L2 caches independent of processor scale. For multithreaded programs, our scheme employs a novel machine learning based data affinity analysis technique for dynamically allocated data structures. The derived hints become independent of program inputs and microarchitecture configurations. Our experimental results demonstrate that the proposed approach is very effective in reducing the number of remote accesses for the shared cache organization. The page coloring scheme with on-line cache access cost analysis for single-threaded programs outperforms the conventional shared caching scheme by up to 191% with an average of 32% on a 16-tile CMP. For multithreaded programs, it achieve 10% performance improvement over the shared cache scheme by utilizing data affinity hints. When combined with the victim replication technique, our approach secures additional performance gain of 9%, performing 19% and 9% better than the shared cache scheme and the private cache scheme respectively.Dissertation Adviser
Dr. Dr. Sangyeun Cho, Department of Computer ScienceCommittee Members
Dr. Bruce Childers,Dr. Rami Melhem,
Dr. Onur Mutlu





