Leveraging Transverse Reads to Correct Alignment Faults in Domain Wall Memories
Proc of the International Conference on Dependable Systems and Networks (DSN), Portland, OR (June 2019)
S. Ollivier, D. Kline, R. Kawsher, R. Melhem, S. Bhanja and A. Jones
[google]


Optimal Placement of In-memory Checkpoints under Heterogeneous Failure Likelihood
Proc. of the Int. Conf. on Parallel and Distributed Processing (IPDPS), Rio de Janeiro, Brazil (May 2019)
Z. Hussain, T. Znati and R. Melhem
[google]


CoLoR: Co-Located Rescuers for Fault Tolerance in HPC Systems
Proc. of the International Conference on Parallel and Distributed Systems (ICPADS), Sentosa, Singapore (December 2018)
Z. Hussain, X. Cui, T. Znati and R. Melhem
[google]


Improving Sustainability Through Disturbance Crosstalk Mitigation in Deeply Scaled Phase-change Memory
Proc. of the International Green and Sustainable Computing Conference (IGSC), Pittsburgh, PA. (October 2018)
S. Seyedzadeh, A. Jones and R. Melhem
[google]


Partial Redundancy in HPC Systems with Non-Uniform Node Reliabilities
Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), Dallas, TX (November 2018).
Z. Hussain, T. Znati and R. Melhem
[google]


Mitigating Word-line Crosstalk using Adaptive Trees of counters
Proc. of the International Symposium on Computer Architecture (ISCA), Loa Anglos, CA (June 2018).
S. Seyedzadeh, A. Jones and R. Melhem
[google]


A systematic Fault-tolerant Computational Model for Both Crash Failures and Silent Data Corruption
Proc. of the 21st Conference on Innovation in Clouds, Internet and Networks (ICIN), Paris, France (February 2018).
X. Cui, Z. Hussain, T. Znati and R. Melhem
[google]


Rejuvenating Shadows: Fault Tolerance with Forward Recovery
Proc. of the Int. Conf. on High Performance Computing and Communications (HPCC), Bangkok, Thailand, (December 2017).
X. Cui, T. Znati and R. Melhem
[google]


Yoda: Judge me by my size, do you?
Proc. of the International Conference on Computer Design (ICCD), Boston, MA (November 2017).
J. Zhang, D. Kline Jr., L. Fang, R. Melhem and A. Jones
[google]


Dynamic Partitioning to Mitigate Stuck-at Faults in Emerging Memories
Proc. of the International Conference on Computer Aided Design (ICCAD), Irvine, CA. (November 2017).
J. Zhang, D. Kline Jr., L. Fang, R. Melhem and A. Jones
[google]


Holistic Energy Efficient Crosstalk Mitigation in DRAM
Proc. of the International Green and Sustainable Computing Conference (IGSC), Orlando, FL. (October 2017).
D. Kline, R. Melhem and A. Jones
[google]


Sustainable Fault Management and Error Correction for Next-Generation Main Memories
Proc. of the International Green and Sustainable Computing Conference (IGSC), Orlando, FL. (October 2017).
D. Kline, R. Melhem and A. Jones
[google]


Mitigating Bitline Crosstalk noise in DRAM Memories
Proc. of the International Symposium on Memory Systems (MEMSYS), Washington, DC (October 2017).
S. Seyedzadeh, D. Kline, A. Jones and R. Melhem
[google]


Harvesting Underutilized Resources to Improve Responsiveness and Tolerance to Crash and Silent Faults for Data-intensive Applications
Proc. of the International Conference on Cloud Computing (IEEE CLOUD), Honolulu, HI, (June 2017).
D. Ganguly, M. Mofrad ,T. Znati, R. Melhem and J. Lange
[google]


Adaptive and Power-Aware Resilience for Extreme-scale Computing
Proc of the 16th IEEE International Conference on Scalable Computing and Communications (ScalCom), Toulouse, France, July 2016.
X. Cui, T. Znati and R. Melhem
[google]


Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM
Proc of the International Conference on Dependable Systems and Networks (DSN), Toulouse, France, (June 2016).
S. Seyedzadeh, R. Maddah, A. Jones and R. Melhem
[google]


Energy Consumption of Resilience Mechanisms in Large Scale Systems
Proc. of the 22nd Euromicro Int. Conference on Parallel, Distributed, and Network-Based Processing (PDP), Turin, Italy (February 2014).
B. Mills, T. Znati, R. Melhem, K. Ferreira and R. Grant
[google]


Profit Maximization for Resilient Cloud Computing
Proc. of the International Conference on Cloud Computing and Services Science (CLOSER ), Barcelona, Spain (April 2014).
X. Cui, B. Mills, T. Znati and R. Melhem
[google]


Shadow Computing: An Energy-Aware Fault Tolerant Computing Model
Proc. of the International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI (February 2014).
B. Mills, T. Znati and R. Melhem
[google]


Energy-aware Checkpointing of Divisible Tasks with Soft and Hard Deadlines
Proc. of the fourth International Green Computing Conference (IGCC), Arlington, VA (June 2013).
G. Aupy, A. Benoit, R. Melhem, P. Renaud-Goud and Y. Robert
[google]


Power of One Bit: Increasing Error Correction Capability with Data Inversion
Proc. of the Pacific Rim International Symposium on Dependable Computing (PRDC), Vancouver, Canada (December 2013).
R. Maddah, S. Cho and R. Melhem
[google]


RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory
Proc. of the 42nd IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Boston, MA (June 2012).
R. Melhem, R. Maddah and S. Cho
[google]


Process Variation Tolerant Design for Nanophotonic Networks
Proc. of The International Symposium on Computer Architecture (ISCA), Portland, OR (June 2012).
Y. Xu, J. Yang and R. Melhem
[google]


Considering Link Qualities in Fault Tolerant Aggregation in Wireless Sensor Networks
Proc. of the IEEE Global Telecommunications Conference (Globecom’09), Honolulu, HI (December 2009)
S. Gobriel, S. Khattab, D. Mosse and R. Melhem
[google]


RideSharing: Fault Tolerant Aggregation in Sensor Networks Using Corrective Actions
Proc. of the Third Annual IEEE Communications Society Conference on Sensor, Mesh, and Ad Hoc Communications and Networks (SECON), Reston, VA (September 2006)
S. Gobriel, S. Khattab, D. Mosse J. Brustoloni and R. Melhem
[ps/pdf]


The Effects of Energy Management on Reliability in Real-time Embedded Systems
Proc. of the International Conference on Computered Aided Design (ICCAD), San Jose, CA (Nov. 2004)
D. Zhu, R. Melhem, and D. Mosse
[ps/pdf]


Energy-Efficient Duplex and TMR Real-Time Systems
Proc. of the Real-time System Symposium RTSS, Austin, TX (Dec. 2002)
E. Elnozahy, R. Melhem and D. Mosse
[ps/pdf]


Power Aware Scheduling for AND/OR Graphs in Multi-Processor Real-Time Systems
Proc. of the International Conference on Parallel Processing (ICPP), Vancouver, B.C. (Aug. 2002)
D. Zhu, N. AbouGhazaleh, D. Mosse and R. Melhem
[ps/pdf]


Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics,
Proc the 12th Euromicro Conference on Real-time Systems, Delft, The Netherlands (June 2001)
H. Aydin, R. Melhem, D. Mosse and P. Mejia-Alvarez
[ps/pdf]


Optimal Scheduling of Imprecise Computation Tasks in the Presence of Multiple Faults
Proc. of the Real-Time Computing Systems and Applications Sypm., Cheju, Korea, (Dec. 2000)
H. Aydin, D. Mosse, and R. Melhem
[ps/pdf]


Tolerating Faults while Maximizing Reward
Proc. the 12th Euromicro Conference on Real-time Systems, Stockholm, Sweden (June 2000)
H. Aydin, R. Melhem and D. Mosse
[ps/pdf]


Scheduling Optional Computations in Fault-Tolerant Real-Time Systems
Proc. of the Real-Time Computing Systems and Applications Sypm., Cheju, Korea, (Dec. 2000)
P. Mejia Alvarez, H. Aydin, D. Mosse, and R. Melhem
[ps/pdf]


Reducing Message Overhead in TMR Systems
Proc. of the IEEE International Conference on Distributed Computing Systems (ICDCS ?99), Dallas, TX (June 1999)
J. Ramirez and R. Melhem
[ps/pdf]


Implementation of a Transient Fault-tolerance Scheme on DEOS
Proc. of The Real-time Technology and Application Symposium, RTAS, Vancouver, Canada (June 1999)
L. Dong, R. Melhem, S. Ghosh, W. Heimerdinger and A. Larson
[ps/pdf]


Global Fault Tolerant Real-Time Scheduling on Multiprocessors
Proc. of The 10th IEEE Euromicro Real-Time Workshop, York, UK (June 1999)
F. Liberato, S. Lauzac, R. Melhem and D. Mosse
[ps/pdf]


Incorporating Error Recovery into the Imprecise Computation Model
Proc of the International Conference on Real-Time Computing Systems, and Applications, RTCSA ?99, Hong-Kong (Dec. 1999)
H. Aydin, R. Melhem and D. Mosse
[ps/pdf]


Fault Tolerant, Rate Monotonic Scheduling
IFIP International Conference on Dependable Computing for Critical Applications - DCCA, Garmisch - Germany (March 1997)
S. Ghosh, R. Melhem and D. Mosse
[google]


Enhancing Real-Time Schedules to Tolerate Transient Faults
Proc. of the 16th IEEE Real-Time Systems Symposium, Pisa, Italy, (1995)
S. Ghosh, R. Melhem and D. Mosse
[ps/pdf]


Fault-Tolerant Scheduling on Hard Real-Time Multiprocessor Systems
Proc. of the 8th Int. Parallel Processing Symposium, Cancun, Mexico (1994)
S. Ghosh, R. Melhem and D. Mosse
[ps/pdf]


Compiler Assisted Fault Detection for Distributed Memory Systems
Proc. of the 1994 Scalable High Performance Computing Conference, Knoxville, TN (1994)
C. Gong, R. Melhem and R. Gupta
[ps/pdf]


Analysis of a Fault-Tolerant Multiprocessor Scheduling Algorithm
Proc. of the 24th Fault-Tolerant Computing Symposium, Austin, TX (1994)
D. Mosse, R. Melhem and S. Ghosh
[ps/pdf]


Replicating Statement Execution for Fault Detection on Distributed Memory Multiprocessors
Proc. of the 1994 IEEE Workshop on Fault-Tolerant Parallel and Distributed System, College Station, TX (1994)
C. Gong, R. Melhem and R. Gupta
[ps/pdf]


Reconfiguration in Fault Tolerant 3D Meshes
Workshop on Defect and Faults Tolerance in VLSI Systems, Montrial, Canada (1994)
A. Chandra and R. Melhem
[ps/pdf]


Efficient Bi-level Reconfiguration Algorithms for Fault Tolerant Arrays
IEEE Int. Workshop on Defect and Faults Tolerance in VLSI Systems, Dallas, TX. (1992)
R. Liberskind-Hadas, N. Shrivastava, R. Melhem and C. L. Liu
[ps/pdf]


Routing in Modular Fault Tolerant Multiprocessor Systems
Proc. of the 22nd International IEEE Symposium on Fault Tolerant Computing, Boston, MA (1992)
M. Alam and R. Melhem
[ps/pdf]


Reconfiguration of Computational Arrays with Multiple Redundancy
Proc. of the International Conference on Parallel Processing, St. Charles, Illinois (1991)
R. Melhem and John Ramirez
[ps/pdf]


Embedding Rings in Hypercubes for Run-time Fault Tolerance
Proc. of the Fourth ISMM Conference on Parallel and Distributed Computing and Systems, Washington D.C. (1991)
F. Provost and R. Melhem
[google]


Efficient and Optimal Fault-to-Spare Assignment in Doubly Fault Tolerant Arrays
Proc. of the IEEE Int. Workshop on Defect and Faults Tolerance in VLSI Systems, Hidden Valley, PA. (1991)
N. Shrivastava and R. Melhem
[ps/pdf]


Meshes with Flexible Redundancy
Proc. of the Second Workshop on Algorithms and Parallel VLSI Architectures, Bonas, France, (1991)
R. Melhem and J. Ramirez
[google]


Channel Multiplexing in Modular Fault Tolerant Multiprocessors
Proc. of the International Conference on Parallel Processing, St. Charles, Illinois (1991)
M. Alam and R. Melhem
[ps/pdf]


How to use an Incomplete Hypercube for Fault Tolerance
Proc. of the first European Workshop on Hypercube and Distributed Computers, Rennes, France (1989)
M. Alam and R. Melhem
[google]


Fault Tolerance and Reliable Routing in Augmented Hypercube Architectures
Proc. of the 8th. IEEE Phoenix Conference on Computers and Communications, Phoenix, AZ (1989)
M. Alam and R. Melhem
[ps/pdf]


Bi-Level Reconfigurations of Fault Tolerant Arrays in Bi-modal Computational Environments
Proc. of the 19th. International IEEE Symposium on Fault Tolerant Computing Chicago, IL (1989)
R. Melhem
[ps/pdf]


Fault Tolerant Embedding of Binary Trees and Rings into Hypercubes
Proc. of the International Workshop on Defect and Fault Tolerance in VLSI Systems, Springfield, MA (1988)
F. Provost and R. Melhem
[google]