Founded in 1966

Tree-based Overlay Networks for Scalable, Reliable Tools and Applications.

Dorian Arnold (University of Wisconsin-Madison)

Monday February 11, 2008
10:00 a.m. - SENSQ 5317

Refreshments at 9:30 a.m.

Hosted by Sangyeun Cho

Abstract

HPC systems continue to grow in size and complexity making the development of scalable software systems increasingly difficult. As a result, very few tools and applications run effectively or at all at today's largest scales (tens and hundreds of thousands of processors). To make matters worse, million processor systems are scheduled for availability within the next two to four years.

Tree-based Overlay Networks (TBONs) have proven to be an effective computational model for scalable distributed tools and applications. ATBON is a network of hierarchically organized processes that exploits the logarithmic scaling properties of trees to provide scalable data multicast, gather, and in-network aggregation. In this talk, I will describe the TBON model, demonstrating its power and flexibility with scalability results up to 131,072 processors from a variety of application domains. I also will describe our novel TBON failure recovery model, state compensation, which relies on inherent information redundancies amongst TBON processes. State compensation features fast, decentralized tree reconstruction and state recovery protocols involving a small subset of the tree and no process coordination. The protocols are scalable because their performance is a function of the tree's fan-out, not total size. A tree with a fan-out of 64 recovers from failures in milliseconds: with only four levels, such a tree supports over 16,000,000 processes!

Biography of speaker

Dorian Arnold is a doctoral candidate and Intel Foundation Ph.D. fellow in the Computer Sciences Department at the University of Wisconsin. He holds a M.S. degree in Computer Science from the University of Tennessee and a B.S. degree in Mathematics and Computer Science from Regis University (Denver, CO). From 1999 to 2001, Dorian served as technical lead of the NetSolve project at the University of Tennessee's Innovative Computing Laboratory. In 2006, Dorian was a technical scholar at Lawrence Livermore National Laboratory. His research focuses on the performance and scalability issues of large distributed systems including efficient communication and runtime data anaysis, fault-tolerance,and system deployment.

You are using an older browser that does not support current Web standards. Although this site is viewable in all browsers, it will look much better in a browser that supports Web standards.