The POSSE Project

Alexander L. Wolf
Benjamin G. Zorn

Department of Computer Science
University of Colorado, Boulder

Contact Information

Benjamin G. Zorn
Department of Computer Science, Campus Box 430
University of Colorado
Boulder, CO 80309
Phone: (303) 492-4398
Fax: (303) 492-2844
Email:  zorn@cs.colorado.edu

WWW PAGE

The POSSE Project Home Page

Keywords

Garbage collection, storage management, object databases, trace generation, benchmarks, persistent objects, performance measurement

Project Award Information

Project Summary

The POSSE (Persistent Object SyStem Evaluation) Project seeks to design, implement, and evaluate alternative implementation methods for persistent object systems. Our current work is specifically focused on the problem of storage management in such systems. Our contributions to date fall into two broad categories: new policies for garbage collection reclamation in object database systems, and infrastructure for making the evaluation of persistent object system implementations easier to conduct and more accurate. 

The garbage collection policies we have developed to date include new methods for partition selection in a partition-based garbage collection system, and a semi-automatic garbage collection rate algorithm that balances collector overhead with wasted storage space.  We are currently investigating the relationship between object copying and collector performance, and are also interested in the performance of systems in which large numbers of irregularly-sized objects (e.g., WWW pages) are created and rapidly become garbage.

Our work related to building evaluation infrastructure is motivated by the lack or inaccessibility of existing object database or persistent object system benchmarks that demonstrate complex time-varying behavior.  Such behavior is likely to strongly influence the performance of garbage collection for such systems, and currently the most common benchmarks used for such evaluation are the OO7 benchmark and microbenchmarks based on simple regular object interconnections.

To address this lack of real applications, we have developed several different systems.   All the infrastructure work we have done is based on a trace file format we've developed (PTFF) that we use to capture the structure and behavior of database application behavior.  These traces have several uses, but in particular can be used to drive simulations and performance evaluations of different storage system implemenations.

We have constructed a ODBMs implementation simulator, Odbsim, that we have used to compare alternative garbage collection implementation strategies.  The rest of our infrastructure is developed for the purpose of generating representative structure and behavior traces from synthetic and actual object database and persistent object system applications.  Specifically, we have constructed BIT, a Java bytecode instrumentation tool, to allow us to capture execution traces from persistent Java applications.  We have also developed Tragedy, a synthethic benchmark tool, that allows us to quickly construct representative object database behaviors and transparently generate PTFF trace files.

Goals, Objectives, and Targeted Activities

We are interested in understanding the behavior of object database systems and persistent object systems so that we can improve their performance.  Our specific focus has been on developing new algorithms for storage management in these systems.   Because there are no existing object database benchmarks that exhibit complex time-dependent behavior, our current work has three major subgoals:

Indication of Success

Our project has the specific goal of developing and evaluating algorithms for storage management in object database systems and persistent object systems, and the meta-goal of developing and distributing evaluation infrastructure to facilitate other researchers in their evaluation work.

Our work on evaluation infrastructure currently has had the broadest impact outside of our project.  The Java bytecode instrumentation tool that we have developed, BIT, is currently being used by individuals at seven organizations, including Harvard, Princeton, the University of Massachusetts,  and the University of California at San Diego, as well as at industrial research labs.  We have developed additional tools based on BIT, including a trace-generation tool that collects behavior data from ODI's Persistent Storage Engine Java implementation, and we anticipate releasing these tools as well in the near future.  We also anticipate releasing database behavior traces in the PTFF format, as well as the source code to the Tragedy tool that will allow others to generate database behaviors and PTFF trace files.

Our project has been successful at both of the stated objectives.  The storage management algorithms we have investigated have influenced other research in the field, and papers describing the work have been published in highly visible venues.  The evaluation infrastructure we have developed and continue to develop have already been used by researchers at other institutions, and support the important goal of improving the quality of performance evaluation in persistent object systems.  It is especially important that such work be supported with research funding, because if the same work were conducted in an industrial environment, there would be less incentive to develop infrastructure that could be shared by the research community.

Project Success and Impact

The PhD students who have participated in this project include:

In the course of this project, we have interacted with individuals from the following organizations:

One additional activity spawned by the POSSE project is that an undergraduate, Brian Cooper, was supported with an NSF Research Experience for Undergraduates award.   Using the BIT tool, developed through the POSSE grant, he has been developing a Java meta-tool that allows custom profilers to be rapidly constructed.

Project References

Thorna Humphries, Artur Klauser, Alexander Wolf, and Benjamin Zorn, "A Framework for Evaluating Object Database Management System Implementations", February 1998, in preparation.

Thorna Humphries, Artur Klauser, Alexander Wolf, and Benjamin Zorn, "The POSSE Trace File Format", February 1998, in preparation.

J.E. Cook, A.L. Wolf, and B.G. Zorn,  "A Highly Effective Partition Selection Policy for Object Database Garbage Collection, IEEE Transactions on Knowledge and Data Engineering, January 1998.

Han Bok Lee and Benjamin Zorn, "BIT: A Tool for Instrumenting Java Bytecodes'', Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS97), pp.73-82, Monterey, CA, December 1997.

Han Bok Lee and Benjamin G. Zorn "Bytecode Instrumentation as an Aid in Understanding the Behavior of  Java Persistent Stores,''   OOPSLA'97 Workshop on Memory Management and Garbage Collection, Atlanta, GA, October 1997.

Thorna O. Humphries, Alexander L. Wolf, and Benjamin G. Zorn,  "A Framework for Storage Management in Persistent Object Systems'', OOPSLA'97 Workshop on Memory Management and Garbage Collection, Atlanta, GA, October 1997.

J.E. Cook, A. Klauser, A.L. Wolf, and B.G. Zorn, "Semi-automatic, Self-adaptive Control of Garbage Collection Rates in Object Databases", Proc. SIGMOD 1996 International Conference on Management of Data, Montreal, Canada, June 1996, pp. 377-388.  (postscript)

Jonathan E. Cook, Alexander L. Wolf, and Benjamin G. Zorn, ``Partition Selection Policies in Object Database Garbage Collection,'' 1994 ACM SIGMOD International Conference on the Management of Data, pages 371--382. Minneapolis, MN. May 1994. (postscript)

Area Background

Object database management systems (ODBMSs) can be viewed as an integration of research results from the areas of database management systems and programming language systems. The goal of the integration is to support the definition of a richer set of types for persistent data and to support the manipulation of those data using a more powerful, programming-language-like model of computation. Two concepts that were extensively developed in the programming language area and that are now profitably employed in ODBMSs are direct support for complex, highly interconnected data and a notion of object identity separate from object value. 

The introduction of these two concepts, however, has greatly complicated a critical performance aspect of database management systems, namely the reclamation of storage for persistent, secondary-memory objects that are no longer accessible. As shown by decades of experience with network and relational databases, the presence of inaccessible data, while not affecting the functional behavior of an application, can have an impact on its performance, since such data increase the effective size of the database and can increase access time. As a result, most database systems provide a "compact'' or "reorg'' operation to allow database administrators to reduce the database storage usage and improve locality of reference.

Designers of ODBMSs actually have some choice in how storage is reclaimed. The two most basic options are manual reclamation, in which explicit deallocation commands are issued by an application, and automatic reclamation, in which the underlying management system itself directs  process of "discovering'' implicitly inaccessible objects.

Storage management has been studied for three decades by designers of programming language systems in the realm of transient, primary-memory (heap) objects.  Only recently has a strong interest developed in formulating the storage reclamation techniques that are commensurate with the size, complexity, and stability characteristics of persistent, secondary-memory objects in  ODBMSs. In fact, it has been shown that the reclamation algorithms developed for programming language systems are, as currently formulated, inappropriate for use on object databases.

Area References

L. Amsaleg and M. Franklin and O. Gruber, "Efficient Incremental Garbage Collection for Client-Server Object Database Systems", Proceedings of the 21st VLDB Conference, Zurich, Switzerland, September 1995.

Jack Campin and Malcolm Atkinson, "A Persistent Store Garbage Collector with Statistical Facilities", Department of Computing Science, University of Glasgow, 1986, Persistent Programming Research Report,  Number 29,  Glasgow, Scotland.

Elliot Kolodner and William Weihl, "Atomic Incremental Garbage Collection and Recovery for a Large Stable Heap", Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 177-186, Washington, DC, June 1993.

U. Maheshwari and B. Liskov, "Partitioned Garbage Collection of a Large Object Store", Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 313-323, May 1997.

Voon-Fee Yong and Jeffrey Naughton and Jie-Bing Yu, "Storage Reclamation and Reorganization in Client-Server Persistent Object Stores", Proc. of the 10th International Conference on Data Engineering, pages 120-131, February, 1994.

Paul. R. Wilson, "Uniprocessor Garbage Collection Techniques", Proceedings of the International Workshop on Memory Management (IWMM92), St. Malo, France, September 1992.

Potential Related Projects

OOPS Research Group, University of Texas.

Object Systems Laboratory, University of Massachusetts.

Glasgow University's Persistence and Distribution Group.

Barbara Liskov's Programming Methodology Group, MIT.