CS 3420
Fault Tolerant Parallel and Distributed Systems
Fall 1997
Mondays and Wednesdays from 10:00AM to 11:20AM
113 MIB
Instructor
Rami Melhem (melhem@cs.pitt.edu)
219 Mineral Industries Building,
phone: 624-8426
Office Hours:
Mondays from 1:00 to 3:00
Thursdays from 1:00 to 3:00
Course Description
This seminar couse deals with the study of the principles of Fault Tolerant
Computing. Topics covered include: Error detecting and correcting codes,
hardware, software and time redundancy techniques, fault tolerant
multiprocessors, system diagnosis and fault tolerance software,
fault tolerance in distributed and real-time systems and
performance and reliability evaluation techniques.
Reference text books (not required)
-
Fault-Tolerant Computer System Design, by Dhiraj Pradhan - Prentice Hall.
-
Fault Tolerance in Distributed Systems, by Pankaj Jalote - Prentice Hall.
Requirements and grading:
-
One exam in October (15%).
-
Paper presentations and class participation (25%).
-
Class project and report (60%).
-
Presentations by the instructor
Simple error detecting/correcting codes.
Types of redundancy (hardware, software and time redundancies).
Performance and reliability evaluation techniques.
Fault tolerance in distributed systems.
Fault tolerance in real-time systems.
Fault tolerance in Multiprocessor systems.
-
Presentations by students - possible topics:
-
Presentations and Demonstrations of Class Projects