Publications
listed by year
2018
CMH: Compression Management for Improving Capacity in the Hybrid Memory Cube, , ACM International Conference on Computing Frontiers (CF), May 2018
Artifact Evaluation: FAD or Real News?, , International Conference on Data Engineering (ICDE), April 2018 (abstract, short talk)
HMCSP: Reducing Transaction Latency of CSR-based SPMV in Hybrid Memory Cube, , International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2018 (short)
2017
DrMP: Mixed Precision-Aware DRAM for High Performance Approximate and Precise Computing, , Conf. on Parallel Compilation and Architecture Techniques (PACT), September 2017
Artifact Evaluation: Is it a Real Incentive?, , IEEE 13th International Conference on e-Science (e-Science), October 2017 (WSSSPE workshop)
On the Restore Time Variations of Future DRAM Memory, , ACM Transactions on the Design Automation of Electronic Systems (TODAES), February 2017
Evaluating Interactive Archives, , Gateways, October 2017
Quality of Service Support for Fine-Grained Sharing on GPUs, , International Symposium on Computer Architecture (ISCA), June 2017
2016
Asteroid: Scalable Online Memory Diagnostics for Multi-core, Multi-socket Servers, , International Journal of Parallel Programming (IJPP), Volume 44, Issue 5, pp. 949-974, October 2016
AWARD: Approximation-Aware Restore in Further Scaling DRAM, , The International Symposium on Memory Systems (MEMSYS), Alexandria, Virginia, October 2016
Concurrent Migration of Multiple Pages in Software Managed Hybrid Main Memory, , International Conference on Computer Design (ICCD), Pheonix, Arizona, October 2016
Live Code Update for IoT Devices in Energy Harvesting Environments, , IEEE Nonvolatile Memory Systems and Applications Symposium (NVMSA), Daegu, Korea, August 2016
Restore Truncation for Performance Improvement in Future DRAM Systems, , IEEE the 22nd International Symposium on High-Performance Computer Architecture (HPCA), Barcelona, Spain, March 2016
Simultaneous Multikernel GPU: Multi-tasking Throughput Processors via Fine-Grained Sharing, , IEEE the 22nd International Symposium on High-Performance Computer Architecture (HPCA), Barcelona, Spain, March 2016
Symmetry-agnostic Coordinated Management of the Memory Hierarchy in Multi-core Systems, , ACM Transactions on Compiler and Architecture Optimization (TACO), to appear
2015
Achieving Yield, Density and Performance Effective DRAM at Extreme Technology Sizes, , International Symposium on Memory Systems (MEMSYS), Washington, DC, October 2015
Implications of Memory Interference for Composed HPC Applications, , International Symposium on Memory Systems (MEMSYS), Washington, DC, October 2015
Characterizing the Overhead of Software-Managed Hybrid Main Memory, , IEEE Int'l. Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Atlanta, Georgia, October 2015
Simultaneous Multikerel: Fine-grained Sharing of GPGPUs, , IEEE Computer Architecture Letters (CAL), September 2015
HMMSim: A Simulator for Hardware-Software Co-Design of Hybrid Main Memory, , IEEE Nonvolatile Memory Systems and Applications Symposium (NVMSA), Hong Kong, August 2015
Stream Query Processing on Emerging Memory Architectures, , IEEE Nonvolatile Memory Systems and Applications Symposium (NVMSA), Hong Kong, August 2015
Performance Modeling of Multithreaded Programs for Mobile Asymmetric Chip Multiprocessors, , IEEE International Conference on Embedded Software and Systems (ICESS), New York City, New York, August 2015
Asteroid: Scalable Online Memory Diagnostics, , ACM International Conference on Computing Frontiers (CF), Ischia, Italy, May 2015
Understanding the Limiting Factors of Page Migration in Hybrid Main Memory, , ACM Internatonational Conference on Computing Frontiers (CF), Ischia, Italy, May 2015 (poster)
Supporting Superpages in Non-Contiguous Physical Memory, , IEEE International Symposium on High Performance Computer Architecture (HPCA), San Francisco Bay Area, California, February 2015
Exploiting DRAM Restore Time Variations in Deep Sub-micron Scaling, , Design, Automation and Test in Europe (DATE), Grenoble, France, March 2015
A Roadmap and Plan of Action for Community-Supported Empirical Evaluation in Computer Architecture, , Operating Systems Review: Special Issue on Repeatability and Sharing of Experimental Artifacts (OSR), January 2015
2014
Concurrent Page Migration for Mobile Systems with OS-Managed Hybrid Memory, , ACM International Conference on Computing Frontiers (CF), Caligari, Italy, May 2014
Program Affinity Performance Models for Performance and Utilization, , Design Automation and Test in Europe (DATE), March 2014
COMeT+: Continuous Online Memory Testing with Multi-threading Extension, , IEEE Transactions on Computers (TC), Volume 63, Issue 7, July 2014
Bit Mapping for Balanced PCM Cell Programming, , 5th Annual Non-volatile Memories Workshop (NVMW), San Diego, California, March 2014
Writeback-Aware Bandwidth Partitioning for Multi-core Systems with PCM, , 5th Annual Non-volatile Memories Workshop (NVMW), San Diego, California, March 2014
2013
Writeback-Aware Bandwidth Partitioning for Multi-core Systems with PCM, , International Conference on Parallel Architectures and Compilation Techniques (PACT), Edinburgh, Scotland, September 2013
Bit Mapping for Balanced PCM Programming, , International Symposium on Computer Architecture (ISCA), Tel Aviv, Israel, June 2013
Hardware Assisted Cooperative Integration of Wear-Leveling and Salvaging for Phase Change Memory, , ACM Transactions on Architecture and Compiler Optimization (TACO), Volume 10, Issue 2, May 2013
Automatic Generation of Program Affinity Policies using Machine Learning, , ETAPS International Conference on Compiler Construction (CC), March 2013
FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Cell Phase Change Memory, , 4th Annual Non-volatile Memories Workshop (NVMW), San Diego, California, March 2013 (short version for presentation of MICRO 2012 paper)
Delta-compressed Caching for Overcoming the Write Bandwidth Limitation of Hybrid Main Memory, , ACM Transactions on Architecture and Compiler Optimization (TACO), January 2013
2012
FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Phase Change Memory, , The 45th Annual IEEE/ACM International Symposum on Microarchitecture (MICRO), Vancouver, Canada, December 2012
Improving Write Operations in MLC Phase Change Memory, , 3rd Annual Non-volatile Memories Workshop (NVMW), San Diego, California, March 2012 (short version for presentation of HPCA 2012 paper)
Writeback-aware Partitioning and Replacement for Last-Level Cache in Phase-Change Main Memory Systems, , 3rd Annual Non-volatile Memories Workshop (NVMW), San Diego, California, March 2012 (short version for presentation of TACO/HiPEAC 2012 paper)
REEact: A Customizable Virtual Execution Manager for Multicore Platforms, , International Conference on Virtual Execution Environments, London, United Kingdom, March 2012
Using Utility Prediction Models to Dynamically Choose Program Thread Counts, , IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), New Brunswick, New Jersey, April 2012
Improving Write Operations in MLC Phase Change Memory, , International Symposium on High Performance Computer Architecture (HPCA), New Orleans, Louisiana, February 2012
Writeback-aware Partitioning and Replacement for Last-Level Cache in Phase-Change Main Memory Systems, , ACM Transactions on Architecture and Compiler Optimization (TACO), Special Issue on High-Performance and Embedded Architectures and Compilers (HiPEAC), Paris, France, January 2012
Enabling Dynamic Binary Translation in Embedded Systems with Scratchpad Memory, , ACM Transactions on Embedded Computing Systems (TECS), Volume 11, Issue 4, December 2012
2011
COMeT: Continuous Online Memory Test, , IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), Pasadena, California, December 2011
Real-Time Scheduling for Phase Change Main Memory Systems, , The 8th IEEE International Conference on Embedded Software and Systems (ICESS-11), Changsha, China, November 2011 (received Best Paper award)
Jazz2: A Flexible and Extensible Framework for Structural Testing in a Java VM, , 9th International Conference on the Principles and Practice of Programming in Java (PPPJ), Copenhagen, Denmark, August 2011
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems, , ACM Transactions on Architecture and Compiler Optimization (TACO), Accepted February, 2011, appeared July 2011 (Vol. 8, No. 2)
LLS: Cooperative Integration of Wear-Leveling and Salvaging for PCM Main Memory , , International Conference on Dependable Systems and Networks (DSN), Hong Kong, China, June 2011
Inflation and Deflation of Self-Adaptive Applications, , 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Waikiki, Honolulu, Hawaii , May 2011
Analyzing the Impact of Useless Write-backs on Endurance and Energy Consumption of PCM Main Memory, , IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, Texas, April 2011
DEFCAM: A Design and Evaluation Framework for Defect-Tolerant Cache Memories, , ACM Transactions on Compiler and Architecture Optimization, Accepted (with minor revisions), November 2010 (to appear)
CloudCache: Expanding and Shrinking Private Caches, , 17th International Symposium on High-Performance Computer Architecture (HPCA), San Antonio, Texas, February 12-16, 2011
Demand Code Paging for NAND Flash in MMU-less Embedded Systems, , Design Automation and Test in Europe (DATE), Grenoble, France, March 14-18, 2011
Impact of Process Variation on Endurance Algorithms for Wear-Prone Memories, , Design Automation and Test in Europe (DATE), Grenoble, France, March 14-18, 2011
2010
StealthWorks: Emulating Errors in Memory , , International Confernce on Runtime Verification (RV), Malta, November 1-4, 2010 (tool paper)
Using PCM in Next-Generation Embedded Space Applications, , IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Stockholm, Sweden, April 12-15, 2010
Increasing PCM Main Memory Lifetime, , Design, Automation and Test in Europe (DATE), Dresden, Germany, March 8-12, 2010
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache, , 16th International Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, January 9-14, 2010
PERFECTORY: A Fault-Tolerant Directory Memory Architecture, , IEEE Transactions on Computers (TC), Accepted May 2009, appeared May 2010, Vol. 59, No. 5, pp. 638-650
2009
Detecting Bugs in Register Allocation, , ACM Transactions on Programming Languages and Systems (TOPLAS), Accepted October 2009
Heterogeneous Code Cache: Using Scratchpad and Main Memory in Dynamic Binary Translators, , 46th Design Automation Conference (DAC), San Francisco, California, July 2009
MCP: An Energy-Efficient Code Distribution Protocol for Multi-Application WSNs, , International Conference on Distributed Computing in Sensor Systems (DCOSS'09), Marina Del Rey, California, June 2009
Addressing the Challenges of DBT for the ARM Architecture, , ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'09), Dublin, Ireland, June 2009
A Framework for Exploring Optimization Properties, , International Conference on Compiler Construction (CC), York, United Kingdom, March 2009
Transparent Debugging of Dynamically Optimized Code, , International Symposium on Code Generation and Optimization (CGO-2009), Seattle, Washington, March 2009
2008
Reducing Pressure in Bounded DBT Code Caches, , International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), Atlanta, Georgia, October 2008
Running a Java VM inside an Operating System Kernel: A Networking Case Study, , ACM International Conference on Virtual Execution Environments (VEE), Seattle, Washington, March 2008
Integrated CPU and Cache Power Management, , International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC'08), Goteborg, Sweden, January 2008
2007
Exploring the Interplay of Yield, Area and Performance in Processor Caches, , IEEE International Conference on Computer Design (ICCD), Lake Tahoe, CA, October 2007
Fragment Cache Management for Dynamic Binary Translators in Embedded Systems with Scratchpad, , International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), Salzburg, Austria, October 2007
Limits for a feasible speculative trace reuse implementation, , International Journal of High Performance Systems Architecture, InderScience Publishers, 2007, Vol. 1, No. 1, pp. 69 - 76
Integrated CPU and L2 Cache Voltage Scaling using Machine Learning, , ACM Conference on Languages, Compilers, and Tools for Embedded Systems, San Diego, California, June 2007
Virtual Execution Environments: Support and Tools, , Workshop on Next Generation Software, International Symposium on Parallel and Distributed Systems, Long Beach, California, March 2007 (Invited)
Energy Conservation using Power-Aware Cached-DRAM, , IEEE Transactions on Computers (TC), Accepted February 2007
Performance of Graceful Degradation for Cache Faults, , IEEE International Symposium on VLSI, Porto Alegre, Brazil, May 2007
Integrated CPU and L2 Cache Frequency/Voltage Scaling using Supervised Learning, , HiPEAC Workshop on Statistical and Machine Learning Approaches Applied to Architectures and Compilation (SMART'07), Ghent, Belgium, January 2007
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems, , ACM/IEEE International Symposium on Code Generation and Optimization (CGO), San Jose, California, March 2007
2006
Power Aware Mapping of Real-Time Tasks to Multiprocessors, , The Handbook of Parallel Computing: Models, Algorithms, and Applications, Edited by Sanguthevar Rajasekaran et al., CRC Press, 2006
A Speculative Trace Reuse Architecture with Reduced Hardware Requirements, , IEEE Int'l. Symp. on Computer Architecture and High Performance Computing (SBAC-PAD), Oureto, Brazil, October 2006
Catching and Identifying Bugs in Register Allocation, , 13th International Static Analysis Symposium, Seoul, Korea, August 2006
Evaluating Fragment Creation Policies for SDT Systems, , 2nd Int'l. Conf. on Virtual Execution Environments, Ottawa, Canada, June 2006
Profit-driven Scalar Optimization, , ACM Transactions on Architecture and Compiler Optimization (TACO), Accepted May 2006, appeared in Vol. 3, Issue 3, pp. 231-262, September 2006
Power Management in External Memory using Power-Aware Cached-DRAM, , Int'l. Journal on Embedded Systems, Accepted January 2006, appeared Vol. 3, Issue 1/2, 2007
2005
Near-memory Caching for Improved Energy Consumption, , IEEE Int'l. Conf. on Computer Design (ICCD'05), San Jose, California, October 2005
TDB: A Source-Level Debugger for Dynamically Translated Programs, , ACM Sixth Int'l. Symp. on Automated and Analysis-Driven Debugging (AADEBUG'05), Monterey, California, September 2005.
Planning for Code Buffer Management in Distributed Virtual Execution Environments, , ACM/USENIX Conference on Virtual Execution Environments (VEE'05), Chicago, Illinois, June 2005
Energy Conservation in Memory Hierarchies using Power-Aware Cached-DRAM, , Proceedings of the Schloss Dagstuhl Seminar on Power-Aware Computing Systems, book chapter to be published by Springer-Verlag, June 2005.
Collaborative Operating System and Compiler Power Management for Real-Time Applications, , ACM Transactions on Embedded Computing Systems (TECS), Accepted April 2005.
Compile-time planning for overhead reduction in software dynamic translators, , International Journal on Parallel Programming, Vol. 33, No. 2-3, pp. 103-114, Appeared June 2005
Jazz: A tool for demand-driven structural testing, , 14th ETAPS Int'l. Conf. on Compiler Construction (CC), Edinburgh, Scotland, April 2005 (tool paper)
Demand-driven structural testing with dynamic instrumentation, , ACM SIGSOFT Int'l. Conf. on Software Engineering (ICSE'05), St. Louis, Missouri, May 2005
A Model-based Framework: An Approach to Profit-Driven Optimization, , ACM Int'l. Conf. on Code Generation and Optimization (CGO'05), San Jose, California, March 2005
2004
Instrumentation in Software Dynamic Translators for Self-Managed Systems, , ACM SIGSOFT Workshop on Self-Managed Systems during the ACM SIGSOFT 12th Int'l. Symp. on the Foundations of Software Engineering, Long Beach, California, October 31-November 1, 2004.
Value Predictors for Reuse through Speculation on Traces, , IEEE 16th Symp. on Computer Architecture and High Performance Computing (SBAC-PAD'04), Foz do Igaucu, Brazil, October 2004, pp. 48-55.
An Infrastructure for Designing Custom Embedded Wide Counterflow Pipelines, , Journal of Microprocessors and Microsystems, Accepted July 2004, Volume 29(1), February 2005, pp. 27-40.
Compact binaries with code compression in a software dynamic translator, , Conference on Design, Automation and Test in Europe (DATE'04), Paris, France, February 2004, pp. 1052-1057, Vol. 2.
Profile Guided Management of Code Partitions for Embedded Systems, , Conference on Design, Automation and Test in Europe (DATE'04), Paris, France, February 2004, pp. 1396-1397, Vol. 2.
Overhead reduction techniques for software dynamic translation, , NSF Next Generation Software Workshop, Int'l. Parallel and Distributed Processing Symposium, Santa Fe, New Mexico, April 2004
2003
SoftTest: A framework for software testing of Java programs , , Eclipse Technology Exchange Workshop, Anaheim, California, October 27, 2003
The Limits of Speculative Trace Reuse on Deeply Pipelined Processors, , IEEE 15th Symp. on Computer Architecture and High Performance Computing, Sao Paulo/SP, Brazil, November 2003, pp. 36-44.
Flexible Instrumentation for Software Dynamic Translation, , Workshop on Exploring the Trace Space for Dynamic Optimization Techniques, San Francisco, California, June 2003
Predicting the Impact of Optimizations for Embedded Systems, , ACM Conference on Languages, Compilers, and Tools for Embedded Systems, San Diego, California, June 2003
Energy Management for Real-Time Embedded Applications with Compiler Support, , ACM Conference on Languages, Compilers, and Tools for Embedded Systems, San Diego, California, June 2003
Collaborative Operating System and Compiler Power Management for Real-Time Applications, , IEEE Real-Time/Embedded Technology and Applications Symposium, Washington, DC, May 2003, pp. 133-141
Short Courses in System-on-a-Chip (SoC) Design, , IEEE Int'l. Conference on Microelectronic Systems Education (MSE), Anaheim, California, June 2003, pp. 126-127
Custom Wide Counterflow Pipelines for High Performance Embedded Applications, , IEEE Transactions on Computers (TC), Accepted January 2003, Vol. 53, No. 2, February 2004, pp. 141-158
Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multi-processor Real-Time Systems, , IEEE Transactions on Parallel and Distributed Systems (TPDS), Accepted January 2003, appeared July 2003, Vol. 14, No. 7, pp. 686-700
Retargetable and Reconfigurable Software Dynamic Translation, , ACM SIGMICRO Int'l. Conf. on Code Generation and Optimization, San Francisco, California, March 2003, pp. 36-47
Continuous Compilation: A New Approach to Aggressive and Adaptive Code Transformation, , NSF Next Generation Software Workshop, Int'l. Parallel and Distributed Processing Symposium, Nice, France, April 2003
2002
Compilers and Operating Systems for Low Power, , Kluwer Academic Publishers, 2002
2001
Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multi-Processor Real-Time Systems, , 22nd IEEE Real-Time Systems Symposium, London, UK, December 2001, pp. 84-94
Toward The Placement of Power Management Points in Real Time Applications, , Workshop on Compilers and Operating Systems for Low Power, October 2001
2000
Adapting Processor Supply Voltage to Instruction-Level Parallelism, , Koolchips Workshop, Monterey, California, December 2000
Width-Sensitive Scheduling for Resource Constrained VLIW Processors, , ACM Workshop on Feedback-Directed and Dynamic Optimization, Monterey, California, December 2000
Compiler-Assisted Dynamic Power-Aware Scheduling for Real-Time Applications, , Workshop on Compilers and Operating Systems for Low Power, Philadelphia, Pennsylvania, October 2000
Custom Wide Counterflow Piplines for High-Performance Embedded Applications, , Int'l. Conference on Parallel Architecture and Compilation Techniques (PACT'00), October 2000, pp. 57-68.
Reordering Memory Bus Transactions for Reduced Energy Consumption, , ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems, Vancouver, Canada, June 2000
An Infrastructure for Designing Custom Embedded Counterflow Pipelines, , Hawaii Int'l. Conf. on System Sciences, Maui, Hawaii, January, 2000
1999
Automatic Architectural Design of Wide-Issue Counterflow Pipelines, , Workshop on Compiler and Architecture Support for Embedded Systems (CASES'99), Washington, DC, 1999
Architectural Considerations for Application-Specific Counterflow Pipelines, , IEEE Conf. on Adv. Research in VLSI (ARVLSI'99), Atlanta, Georgia, March 1999, pp. 3-22
1998
A Design Environment for Counterflow Pipeline Synthesis, , ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), Lecture Notes in Computer Science, Springer, June 1998, pp. 223-234, Vol. 1474.
1993
Memory Bandwidth Optimizations for Wide-Bus Machines, , Hawaii Int'l. Conf. on System Sciences, January 1993, pp. 466-475, Vol. 1.