Object Store Garbage Collection

J. Eliot B. Moss

Contact Information

Department of Computer Science
University of Massachusetts
Amherst, MA 01003-4610
Phone: (413) 545-4206
Fax : (413) 545-1249
Email: moss@cs.umass.edu

PI's WWW PAGE

Web page for Eliot Moss

Keywords

storage management, garbage collection, object stores, persistence, persistent programming languages, database programming languages

Project Award Information

Project Summary

The project will extend an existing block-by-block main-memory garbage collection scheme, called Mature Object Space, for use in collecting persistent object stores. Less technically, the scheme will locate and reclaim memory containing no longer accessible data in databases. This will relieve programmers of the burdensome and error-prone task of indicating which memory can be reused, thus allowing better programming styles. Our algorithm is the first to work block by block (important for integrating nicely with databases) and to guarantee to find all inaccessible memory. It can also help reorganize (recluster) data and thus improve store performance. The algorithm is incremental, with a coarse granularity (collecting one block at a time).
A significant contribution of the research will be developing the algorithm and arguments for its correctness. Perhaps more significant will be the implementation and measurement of the algorithm, and investigation of how to incorporate reclustering.
The research is the investigator's part of a collaboration between the University of Massachusetts (Amherst) and the University of St.~Andrews (Scotland). However, the research will proceed even if the collaboration does not achieve expectations. The primary benefit of the collaboration is to implement and measure the algorithm in more situations.

Goals, Objectives, and Targeted Activities

We have these goals for the year: to finish infrastructure for supporting the object store garbage collector implementation in our laboratory; to assist colleagues building an implementation in a different system (useful to us for purposes of comparison); and to continue gc algorithm design. Our infrastructure consists of our own Java virtual machine, extended to support persistence. As of this writing, the virtual machine runs some programs successfully and is undergoing more thorough shaking down. Meanwhile, we have implemented the primary mechanisms to support persistence (detection of attempts to access non-resident objects, reading of objects from the persistent store, noting of updates to objects, and writing of objects back to the store). We have extended Java's reflective capabilities in our virtual machine, so we have considerable opportunity for optimizations, both of the fundamental persistence primitives and of higher level constructs (analogous to query optimization). We will also be able to interact with other object stores or databases. There are many interesting directions we are pursuing, not all directly related to the original project.
Colleagues from the University of St. Andrews (Scotland) and the University of Adelaide (Australia) are implementing our object store gc algorithm in the context of the Napier persistent programming language, and hope to have initial results soon.
On the algorithm design front, we have completed design and starting publishing aspects of a distributed version of our gc algorithm, though it does not directly address issues of persistence. Over the next year we anticipate devising more than one strategy for putting together the persistent gc algorithm and the distributed gc algorithm, e.g., for tightly coupled multiprocessors (such as clusters or groups of clusters used to support servers) and loosely coupled systems (e.g., client-server systems).

Indication of Success

We will have made demonstrable progress on three fronts: implementing our persistent Java infrastructure to support the gc algorithm and implementation research; implementing the algorithm in the context of another somewhat different persistent ovbject system (Napier); and devising and publishing relevant gc algorithms and refinements.
If the project objectives are stated narrowly, then are progress is modest: we have yet actually to implement the Persistent Mature Object Space algorithm much less to evaluate its performance, though there is promise of at least having the algorithm implemented within the life of the project. On the other hand, we continue to make valuable algorithmic innovations, and have received some interest from industrial contacts. In terms of the infrastructural work, the real impact may come in other directions, specifically the design and exploitation of new reflective mechanisms and techniques. We believe these will lead to the incorporation of the analog of query optimization into a persistent programming language implementation, without adding a new query language---in fact, without making any language changes to Java at all!
We will also support better integration of Java with object stores and databases, with more transparency, and yet at the same time ultimately with more programmer control, than that afforded by the direct use of "database connectivity" libraries, such as JDBC/ODBC. As is often the case with research, the benefits do not necessarily come from the initial objectives, but from opportunities that develop along the way.

Project Impact

The award has enabled our Java virtual machine development, which is having a positive impact on several diverse collaborative research project within our department, on three significant industrial collaborations (Sun, Intel, and IBM), and three international collaborations (with the University of St. Andrews, Monash University, and the Australian National University). The prospect of being able to write Java programs that manipulate huge volumes of disk resident data, with high transparency, and with sophisticated optimizations to achieve good cpu and I/O performance, is exciting and is energizing many important connection and collaborations. The original goals related to storage management are important and interesting, but these other serendipitous outgrowths of the funded research may have greater significance.

Project References

While we have no publications yet describing the project as it stands, the papers cited below give useful background on our work in this direction. We have grouped them into categories with subheadings.

Garbage Collection of Persistent Stores

J. Eliot B. Moss, David S. Munro, and Richard L. Hudson, ``PMOS: A Complete and Coarse-Grained Incremental Garbage Collector for Persistent Object Stores'', Seventh International Workshop on Persistent Object Systems, Cape May, NJ, May 1996.

Persistent Object Stores and Persistent Programming Languages

Antony L. Hosking and J. Eliot B. Moss, ``Object Fault Handling for Persistent Programming Languages'', ACM Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA '93), Washington, DC, October 1993, 288-303.

Antony L. Hosking, Eric Brown, and J. Eliot B. Moss, ``Update Logging for Persistent Programming Languages: A Comparative Performance Evaluation'', Nineteenth International Conference on Very Large Data Bases, Dublin, Ireland, August 1993, 429-440.

J. Eliot B. Moss, ``Working with Persistent Objects: To Swizzle or Not to Swizzle'', IEEE Transactions on Software Engineering, Volume 18, Number 8, August 1992, 657-673.

J. Eliot B. Moss and Antony L. Hosking, ``Approaches to Adding Persistence to Java'', First International Workshop on Persistence and Java, September 1996, Drymen, Scotland.

J. Eliot B. Moss, Antony L. Hosking, ``Expressing Object Residency Optimizations Using Pointer Type Annotations.'' Proceedings of the Sixth International Workshop on Persistent Object Systems, Tarascon, France, September 1994, published in series Workshops in Computing, Springer-Verlag, 3-15.

Antony L. Hosking and J. Eliot B. Moss, ``Towards Compile-Time Optimisations for Persistence'', Fourth International Workshop on Persistent Object Systems, Martha's Vineyard, September 1990, Morgan Kaufmann, 17-27.

J. Eliot B. Moss, ``Design of the Mneme Persistent Object Store'', ACM Transactions on Information Systems, Volume 8, Number 2, April 1990, 103-139.

Main Memory Garbage Collection

Antony L. Hosking and J. Eliot B. Moss, ``Protection Traps and Alternatives for Memory Management of an Object-Oriented Language'', ACM Symposium on Operating Systems Principles, Asheville, NC, December 1993, 106-119.

Antony L. Hosking, J. Eliot B. Moss, and Darko Stefanovic, ``A Comparative Performance Evaluation of Write Barrier Implementations'', ACM Conference on Object Oriented Programming Systems, Languages, and Applications, (OOPSLA '92), Vancouver, BC, October 1992, 92-109.

Richard L. Hudson and J. Eliot B. Moss, ``Incremental Collection of Mature Objects'', Int'l Workshop on Memory Management, St. Malo, France, Sept. 1992, published as LNCS 637, Springer-Verlag, 388-403.

Amer Diwan, J. Eliot B. Moss, and Richard L. Hudson, ``Compiler Support for Garbage Collection in a Statically Typed Language'', SIGPLAN '92, San Francisco, CA, 273-282.

Richard L. Hudson, J. Eliot B. Moss, Amer Diwan, Christopher F. Weight, ``A Language Independent Garbage Collector Toolkit'', COINS TR 91-47, 1991.

Distributed Garbage Collection

Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, David S. Munro, ``Garbage Collecting the World: One Car at a Time'', Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, (OOPSLA '97), Atlanta, GA, October 1997, 14 pp.

Application to Computer Vision

Gokhan Kutlu, Bruce Draper, J. Eliot B. Moss, and Edward Riseman, ``Support Tools for Visual Information Management'', Fifth Symposium on Document Analysis and Information Retrieval, (SDAIR '96), Las Vegas, NV, April 1996, 101-112.

Area Background

There are several topics discussed in this report. First, consider storage management. We are concerned with systems that manage large numbers of objects in persistent storage (typically disk). The objects may point to one another, just like programming language objects as in Java, Smalltalk, C++, etc. there are some distinguished roots. We wish to retain all objects reachable from the roots via chains of pointer references; the space for all other objects should be reclaimed and reused. In sum, we are talking about automatic storage management, which avoids the dangling reference and storage leak problems of explicitly managed memory. That is, a certain kind of referential integrity is enforced by the system at a low level. Note that one will still insert and remove objects in collections, analogously to insert and deleting tuples in relations, but we have separated the collections and storage management. The benefits are correct and natural operation within rich graph-connected structures, and automation of a tedious and potentially error-prone task. garbage collection can also naturally recluster objects as it works. The challenge is to make automatic storage management efficient. With respect to management of persistent storage, see the references below, in addition to ours above, to come close to the state of the art.

We have also talked about Java virtual machines, reflection, and optimization. One of the interesting things about Java, is that it is compiled to a standard virtual instruction set. This set is typed, and indeed one can type check code as it is loaded, e.g., from a network, to avoid type errors. The resulting code can be quite portable. A virtual machine provides all the facilities to execute the code, which may be by interpreting the virtual machine, by compiling to the host processor's native instruction set, or a combination of the two. One can perform significant optimizations starting from the virtual instruction set code: it is essentially equivalent to a portable compiler intermediate form. Thus, unlike with C++ (for example), it is easier to build a system that can do the dynamic sorts of transformations done in traditional query optimization.

Reflection is the ability of a system to inspect and modify its own behavior. We have extended Java's reflective capabilities to allow inspection of code, which then admits producing new, optimized, code at run time, writing in Java. Put another way, rather than incoporating the query optimizing into the virtual machine "under the covers", we add a hook to the virtual machine and write the optimizer in Java. Obvkously we also provide means to insert the optimized code back into the system and run it, but this is little different from the usual Java dynamic loading.

The optimizations we have in mind proceed by (a) identifying and extracting queries from the Java program, (b) transforming those parts of a program that we extract as queries, and (c) putting the optimized code back into the system and executing it. Clearly we can cache optimizied forms of code, etc., to amortize optimization effort, as is done in existing database systems. The goal is to achieve in a single language framework (e.g., Java) all the benefits of mixed language programming (e.g., Java and SQL) and with much less grief and potentially higher performance.

Area References

Here are some references that might be of use, beyond the works we cited above. They are in alphabetical order by author.

Laurent Amsaleg, Michael Franklin, and Olivier Gruber. Efficient incremental garbage collection for workstation/server database systems. TR CS-TR-3370, Dept. of Comp. Sci., Univ. of Maryland, College Park, MD, 1994.

Jonathan E. Cook, Artur W. Klauser, Alexander L. Wolf, and Benjamin G. Zorn. Effectively controlling garbage collection rates in object databases. TR CU-CS-758-94, Dept. of Comp. Sci., Univ. of Colo., Boulder, CO, 1994.

Jonathan E. Cook, Alexander L. Wolf, and Benjamin G. Zorn. Partition selection policies in object database garbage collection. SIGMOD '94, 371-382, Minneapolis, MN.

Elliot Kolodner, Barbara Liskov, and William Weihl. Atomic garbage collection: Managing a stable heap. SIGMOD '89, 15-25, Portland, OR.

Jacob Seligmann and Steffen Grarup. Incremental mature garbage collection using the train algorithm. ECOOP '95, LNCS 952, 235-252, Aarhus, Denmark. Springer-Verlag.

Potential Related Projects

I am not aware of any specific projects suitable for collaboration, but the obvious directions would be people working on Java, and optimization or persistence for it, people working on optimizing persistent programming languages similar to Java, and others working on automatic storage management for persistent and possibly distributed systems