Panos K. Chrysanthis - Vincenzo Liberatore - Kirk Pruhs
Our goal is to integrate data dissemination and multicast communication
techniques into a working software distribution that provides the middleware
support of a scalable multicast-based data management layer to applications.
The time is ripe to exploit new and effective Internet multicast solutions
to relieve the scalability problems of data-intensive distributed applications.
Our approach will also impact data dissemination in wired and wireless local
area networks, as well as satellite links, where broadcasting is the principal
mode of communication.
Multicast communication raises many data management issues that either do not arise in unicast communication, or that obviously require different solutions than the standard methods used in unicast settings. As an illustrative example consider the application of a highly scalable Web server. The objective of the Web server application is to scale to a large client population, and scalability will be accomplished by using the middleware. In the middleware, the server can disseminate data by choosing any combination of the following three schemes: multicast push, multicast pull, and unicast push. In multicast push the server repeatedly sends information to the clients without explicit client requests. (For example, television is a classic multicast push system). Multicast push is an ideal fit for asymmetric communication links, such as satellites and base station methods, where there is little or no bandwidth from the client to the server. For the same reason, multicast push is also ideal to achieve maximal scalability of Internet hot spots. Hence, generally multicast push should be restricted to hot resources. In multicast pull, the clients make explicit requests for resources, and the server broadcasts the responses to all members of the multicast group. If multiple clients request the same resource at approximately the same time, the server may aggregate these requests, and only broadcast the resource once. One would expect that this possibility of aggregation would improve user perceived performance for the same reason that proxy caches improve performance, that is, it is common for different users to make requests to the same resource. Multicast pull is a good fit for ``warm resources'' for which repetitive multicast push cannot be justified, while there is an advantage in aggregating concurrent client requests. Traditional unicast pull is reserved for cold documents. The end-user should not perceive that Web resources are downloaded with a variety of methods, as the browser and the middleware shield the user from the details of the multi-tier dissemination protocol.
In the Web server application, the document selection unit periodically gathers statistics on document popularity. Once statistics have been collected, the server partitions the resources into hot, warm, and cold documents. When a client wishes to request a Web document, it either downloads it from a multicast channel or it requests the document explicitly depending on whether the document is hot or not. The server also broadcasts an index of sorted URIs which quickly allows the client to determine whether the requested resource is in the hot broadcast set. On the whole, the client determines the multicast channel, downloads the appropriate portions of the index, and determines whether the resource is upcoming along the cyclic broadcast. If the request is not in the hot broadcast set, the client can make an explicit request to the server, and simultaneously starts to listen to the warm multicast channel if one is available. If the page is cold, the requested resource is returned on the same connection. If the page is warm, the clients waits on the warm multicast channel until the requested resource is transmitted. The multicast pull scheduling component resolves contention among client request for the use of the warm multicast channel and establishes the order in which pages are sent over that channel.
In multicast push, the server periodically broadcasts hot resources to the clients. The server chunks hot resources into nearly equal-size pages that fit into one datagram and then cyclically sends them on a single or on a layered multicast channel along with index pages. The frequency and ordering of the pages within the multicast push channel are determined by the multicast push scheduling component. Upon receipt of the desired pages, the client can buffer them to reconstruct the original resource and can cache resources to satisfy future request. The set of hot pages is cyclically multicast, and so received pages are current in that they cannot be more than one cycle out-of-date. Furthermore, certain types of consistency semantics can be guaranteed by transmitting additional information along with the control pages.
This project incorporates both applied and fundamental research investigations. On the applied research side, it focuses on the design and implementation of middleware for multicast data dissemination that transparently provides applications with a flexible array of data management services. These services include document selection, scheduling, indexing, caching, and consistency maintenance. The outline of our architecture is shown in Figure 1 (the transport layer is any one of the protocols that is available independently of this project, and the objective of the transport adaptation layer is to enable the middleware to interact with different types of multicast transport within a uniform interface). This approach has the benefit of unifying several algorithms and techniques from data management as services into one software distribution. As a result, established and new methods from data management will be more generally available to application developers.
On the fundamental research side, beside new algorithms and optimizations for each component, this project aims to provide an insight into the functional synergy of the different components. An integrated approach highlights gaps in the state of the art, and leads to new research problems and solutions. In particular, this middleware exposes the performance and functional trade-off between existing data management algorithms and existing or proposed multicast protocols. Consequently, our approach suggests new research issues in the field of middleware technology.