PhD Proposal
Hardware-Oriented Cache Management for Large-Scale Chip Multiprocessors
Mohammed Hammoud
Tuesday September 29, 2009
1:00 pm - SENSQ 6106 Eli Lilly Room
Abstract
Crossing the billion-transistor per chip barrier has had a profound influence on the emergence of chip multiprocessors (CMPs) as a mainstream architecture of choice. As CMPs' realm is continuously expanding, they must provide high and scalable performance. One of the key requirements to obtaining high performance from CMPs is the management of the limited on-chip cache resources (typically the L2 cache) shared among co-scheduled threads. This thesis describes hardware-oriented caching solutions for distributed chip multiprocessors.Growing non-uniform access latencies, interference misses, processor-memory speed and bandwidth gaps, and diverse workload characteristics pose key caching challenges in face of current computer architects. Our exploration to the CMP cache management problem suggests a general CMP caching framework (CC-FR) that defines three main approaches to solve the problem: (1) data placement, (2) data retention, and (3) data relocation. To address caching challenges and effectively employ CC-FR's approaches, we propose cache equalizer (CE), adaptive controlled migration (ACM), dynamic cache clustering (DCC), and pressure-aware retention (PAR) schemes.
CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. PAR, on the other hand, eliminates destructive interferences by making better use of cache area. PAR synergistically discovers dead blocks and displaces them by replaced ones (thus retains replaced blocks on-chip). CE and PAR adopt CC-FR's data placement and retention approaches, respectively. ACM reduces non-uniform access latencies by promoting cache blocks towards L2 banks close to requesting cores. ACM lies under CC-FR's data relocation category. Lastly, DCC constructs a cache cluster for each core and expands/contracts all clusters dynamically to match each core's cache demand. DCC lies under CC-FR's data placement category and addresses diverse workload characteristics and growing non-uniform access latencies challenges. Simulation results, using a full system simulator, demonstrate the effectiveness of the proposed schemes and show that they compare favorably with related work.
Dissertation Adviser
Dr. Rami Melhem and Dr. Sangyeun Cho, Department of Computer ScienceCommittee Members
Dr. Bruce Childers, Department of Computer Science,Dr. Jun Yang, Department of Electrical and Computer Engineering.





