Object Store Garbage Collection
J. Eliot B. Moss
Contact Information
Department of Computer Science
University of Massachusetts
Amherst, MA 01003-4610
Phone: (413) 545-4206
Fax : (413) 545-1249
Email: moss@cs.umass.edu
PI's WWW PAGE
Web page for Eliot Moss
Keywords
storage management, garbage collection, object stores, persistence, persistent
programming languages, database programming languages
Project Award Information
- Award Number: IRI-9632284
- Duration: 3 years, in 2nd year
- Title: Object Store Garbage Collection
Project Summary
The project will extend an existing block-by-block main-memory garbage
collection scheme, called Mature Object Space, for use in collecting
persistent object stores. Less technically, the scheme will locate and reclaim
memory containing no longer accessible data in databases. This will relieve
programmers of the burdensome and error-prone task of indicating which memory
can be reused, thus allowing better programming styles. Our algorithm is the
first to work block by block (important for integrating nicely with databases)
and to guarantee to find all inaccessible memory. It can also help reorganize
(recluster) data and thus improve store performance. The algorithm is
incremental, with a coarse granularity (collecting one block at a time).
A significant contribution of the research will be developing the algorithm
and arguments for its correctness. Perhaps more significant will be the
implementation and measurement of the algorithm, and investigation of how to
incorporate reclustering.
The research is the investigator's part of a collaboration between the
University of Massachusetts (Amherst) and the University of St.~Andrews
(Scotland). However, the research will proceed even if the collaboration does
not achieve expectations. The primary benefit of the collaboration is to
implement and measure the algorithm in more situations.
Goals, Objectives, and Targeted Activities
We have these goals for the year: to finish infrastructure for supporting the
object store garbage collector implementation in our laboratory; to assist
colleagues building an implementation in a different system (useful to us for
purposes of comparison); and to continue gc algorithm design. Our
infrastructure consists of our own Java virtual machine, extended to support
persistence. As of this writing, the virtual machine runs some programs
successfully and is undergoing more thorough shaking down. Meanwhile, we have
implemented the primary mechanisms to support persistence (detection of
attempts to access non-resident objects, reading of objects from the
persistent store, noting of updates to objects, and writing of objects back to
the store). We have extended Java's reflective capabilities in our virtual
machine, so we have considerable opportunity for optimizations, both of the
fundamental persistence primitives and of higher level constructs (analogous
to query optimization). We will also be able to interact with other object
stores or databases. There are many interesting directions we are pursuing,
not all directly related to the original project.
Colleagues from the University of St. Andrews (Scotland) and the University of
Adelaide (Australia) are implementing our object store gc algorithm in the
context of the Napier persistent programming language, and hope to have
initial results soon.
On the algorithm design front, we have completed design and starting
publishing aspects of a distributed version of our gc algorithm, though it
does not directly address issues of persistence. Over the next year we
anticipate devising more than one strategy for putting together the persistent
gc algorithm and the distributed gc algorithm, e.g., for tightly coupled
multiprocessors (such as clusters or groups of clusters used to support
servers) and loosely coupled systems (e.g., client-server systems).
Indication of Success
We will have made demonstrable progress on three fronts: implementing our
persistent Java infrastructure to support the gc algorithm and implementation
research; implementing the algorithm in the context of another somewhat
different persistent ovbject system (Napier); and devising and publishing
relevant gc algorithms and refinements.
If the project objectives are stated narrowly, then are progress is modest: we
have yet actually to implement the Persistent Mature Object Space algorithm
much less to evaluate its performance, though there is promise of at least
having the algorithm implemented within the life of the project. On the other
hand, we continue to make valuable algorithmic innovations, and have received
some interest from industrial contacts. In terms of the infrastructural work,
the real impact may come in other directions, specifically the design and
exploitation of new reflective mechanisms and techniques. We believe these
will lead to the incorporation of the analog of query optimization into a
persistent programming language implementation, without adding a new query
language---in fact, without making any language changes to Java at all!
We will also support better integration of Java with object stores and
databases, with more transparency, and yet at the same time ultimately with
more programmer control, than that afforded by the direct use of "database
connectivity" libraries, such as JDBC/ODBC. As is often the case with
research, the benefits do not necessarily come from the initial objectives,
but from opportunities that develop along the way.
Project Impact
- Human Resources: The project is directly supporting one graduate
student, who has formed a PhD committee and expects to graduate in about a
year.
- Education and curriculum development at all levels: The primary
educational impact is on graduate students. The Java virtual machine
infrastructure is being used or worked on by four graduate students and one
undergraduate, and will be used by at least that many more once it is robust
enough for every day use. It connects with a range of other projects in
compilers, computer vision, and exciting new tools for wildlife
biologists. We also anticipate that the Java system will be used in course
offerings related to persistence and object-oriented databases.
- Departmental/institutional infrastructure: As explained under
"Education", the Java system is to be used in several overlapping
collaborations, and in teaching as well. It will have many uses, probably a
number that we cannot yet anticipate.
- Industry (collaborations, transfer of technology, patents): Our Java
work, in virtual machine development, in optimization, in garbage
collection, and in persistence, has resulted in three industrial
collaborations, two of which are quite new and just getting underway. We are
working with Sun Microsystems on issues related to gc; with
Intel on issues related to gc, persistence, and virtual machine performance;
and with IBM on interactions between optimization and gc.
The award has enabled our Java virtual machine development, which is having a
positive impact on several diverse collaborative research project within our
department, on three significant industrial collaborations (Sun, Intel, and
IBM), and three international collaborations (with the University of
St. Andrews, Monash University, and the Australian National University). The
prospect of being able to write Java programs that manipulate huge volumes of
disk resident data, with high transparency, and with sophisticated
optimizations to achieve good cpu and I/O performance, is exciting and is
energizing many important connection and collaborations. The original goals
related to storage management are important and interesting, but these other
serendipitous outgrowths of the funded research may have greater significance.
Project References
While we have no publications yet describing the project as it stands, the
papers cited below give useful background on our work in this direction. We
have grouped them into categories with subheadings.
Garbage Collection of Persistent Stores
J. Eliot B. Moss, David S. Munro, and Richard L. Hudson, ``PMOS: A Complete
and Coarse-Grained Incremental Garbage Collector for Persistent Object
Stores'', Seventh International Workshop on Persistent Object Systems,
Cape May, NJ, May 1996.
Persistent Object Stores and Persistent Programming Languages
Antony L. Hosking and J. Eliot B. Moss, ``Object Fault Handling for Persistent
Programming Languages'', ACM Conference on Object Oriented Programming
Systems, Languages, and Applications (OOPSLA '93), Washington, DC, October
1993, 288-303.
Antony L. Hosking, Eric Brown, and J. Eliot B. Moss, ``Update Logging for
Persistent Programming Languages: A Comparative Performance Evaluation'',
Nineteenth International Conference on Very Large Data Bases, Dublin,
Ireland, August 1993, 429-440.
J. Eliot B. Moss, ``Working with Persistent Objects: To Swizzle or Not to
Swizzle'', IEEE Transactions on Software Engineering, Volume 18, Number
8, August 1992, 657-673.
J. Eliot B. Moss and Antony L. Hosking, ``Approaches to Adding Persistence to
Java'', First International Workshop on Persistence and Java, September
1996, Drymen, Scotland.
J. Eliot B. Moss, Antony L. Hosking, ``Expressing Object Residency
Optimizations Using Pointer Type Annotations.'' Proceedings of the Sixth
International Workshop on Persistent Object Systems, Tarascon, France,
September 1994, published in series Workshops in Computing,
Springer-Verlag, 3-15.
Antony L. Hosking and J. Eliot B. Moss, ``Towards Compile-Time
Optimisations for Persistence'', Fourth International Workshop on
Persistent Object Systems, Martha's Vineyard, September 1990, Morgan
Kaufmann, 17-27.
J. Eliot B. Moss, ``Design of the Mneme Persistent Object Store'', ACM
Transactions on Information Systems, Volume 8, Number 2, April 1990,
103-139.
Main Memory Garbage Collection
Antony L. Hosking and J. Eliot B. Moss, ``Protection Traps and Alternatives
for Memory Management of an Object-Oriented Language'', ACM Symposium on
Operating Systems Principles, Asheville, NC, December 1993, 106-119.
Antony L. Hosking, J. Eliot B. Moss, and Darko Stefanovic, ``A Comparative
Performance Evaluation of Write Barrier Implementations'', ACM Conference
on Object Oriented Programming Systems, Languages, and Applications, (OOPSLA
'92), Vancouver, BC, October 1992, 92-109.
Richard L. Hudson and J. Eliot B. Moss, ``Incremental Collection of Mature
Objects'', Int'l Workshop on Memory Management, St. Malo, France,
Sept. 1992, published as LNCS 637, Springer-Verlag, 388-403.
Amer Diwan, J. Eliot B. Moss, and Richard L. Hudson, ``Compiler Support for
Garbage Collection in a Statically Typed Language'', SIGPLAN '92,
San Francisco, CA, 273-282.
Richard L. Hudson, J. Eliot B. Moss, Amer Diwan, Christopher F. Weight, ``A
Language Independent Garbage Collector Toolkit'', COINS TR 91-47, 1991.
Distributed Garbage Collection
Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, David S. Munro, ``Garbage
Collecting the World: One Car at a Time'', Proceedings of the ACM SIGPLAN
Conference on Object-Oriented Programming Systems, Languages, and Applications,
(OOPSLA '97), Atlanta, GA, October 1997, 14 pp.
Application to Computer Vision
Gokhan Kutlu, Bruce Draper, J. Eliot B. Moss, and Edward Riseman,
``Support Tools for Visual Information Management'', Fifth Symposium on
Document Analysis and Information Retrieval, (SDAIR '96), Las Vegas, NV,
April 1996, 101-112.
Area Background
There are several topics discussed in this report. First, consider storage
management. We are concerned with systems that manage large numbers of objects
in persistent storage (typically disk). The objects may point to one another,
just like programming language objects as in Java, Smalltalk, C++, etc. there
are some distinguished roots. We wish to retain all objects reachable from the
roots via chains of pointer references; the space for all other objects should
be reclaimed and reused. In sum, we are talking about automatic storage
management, which avoids the dangling reference and storage leak problems of
explicitly managed memory. That is, a certain kind of referential integrity is
enforced by the system at a low level. Note that one will still insert and
remove objects in collections, analogously to insert and deleting tuples in
relations, but we have separated the collections and storage management. The
benefits are correct and natural operation within rich graph-connected
structures, and automation of a tedious and potentially error-prone
task. garbage collection can also naturally recluster objects as it works. The
challenge is to make automatic storage management efficient.
With respect to management of persistent storage, see the references below, in
addition to ours above, to come close to the state of the art.
We have also talked about Java virtual machines, reflection, and
optimization. One of the interesting things about Java, is that it is compiled
to a standard virtual instruction set. This set is typed, and indeed one can
type check code as it is loaded, e.g., from a network, to avoid type
errors. The resulting code can be quite portable. A virtual machine provides
all the facilities to execute the code, which may be by interpreting the
virtual machine, by compiling to the host processor's native instruction set,
or a combination of the two. One can perform significant optimizations
starting from the virtual instruction set code: it is essentially equivalent
to a portable compiler intermediate form. Thus, unlike with C++ (for example),
it is easier to build a system that can do the dynamic sorts of
transformations done in traditional query optimization.
Reflection is the ability of a system to inspect and modify its own
behavior. We have extended Java's reflective capabilities to allow inspection
of code, which then admits producing new, optimized, code at run time, writing
in Java. Put another way, rather than incoporating the query optimizing into
the virtual machine "under the covers", we add a hook to the virtual machine
and write the optimizer in Java. Obvkously we also provide means to insert the
optimized code back into the system and run it, but this is little different
from the usual Java dynamic loading.
The optimizations we have in mind proceed by (a) identifying and extracting
queries from the Java program, (b) transforming those parts of a program that
we extract as queries, and (c) putting the optimized code back into the
system and executing it. Clearly we can cache optimizied forms of code, etc.,
to amortize optimization effort, as is done in existing database systems. The
goal is to achieve in a single language framework (e.g., Java) all the
benefits of mixed language programming (e.g., Java and SQL) and with much less
grief and potentially higher performance.
Area References
Here are some references that might be of use, beyond the works we cited
above. They are in alphabetical order by author.
Laurent Amsaleg, Michael Franklin, and Olivier Gruber.
Efficient incremental garbage collection for workstation/server
database systems.
TR CS-TR-3370, Dept. of Comp. Sci.,
Univ. of Maryland, College Park, MD, 1994.
Jonathan E. Cook, Artur W. Klauser, Alexander L. Wolf, and Benjamin G. Zorn.
Effectively controlling garbage collection rates in object databases.
TR CU-CS-758-94, Dept. of Comp. Sci.,
Univ. of Colo., Boulder, CO, 1994.
Jonathan E. Cook, Alexander L. Wolf, and Benjamin G. Zorn.
Partition selection policies in object database garbage collection.
SIGMOD '94, 371-382, Minneapolis, MN.
Elliot Kolodner, Barbara Liskov, and William Weihl.
Atomic garbage collection: Managing a stable heap.
SIGMOD '89, 15-25, Portland, OR.
Jacob Seligmann and Steffen Grarup.
Incremental mature garbage collection using the train algorithm.
ECOOP '95, LNCS 952, 235-252, Aarhus, Denmark. Springer-Verlag.
Potential Related Projects
I am not aware of any specific projects suitable for collaboration, but the
obvious directions would be people working on Java, and optimization or
persistence for it, people working on optimizing persistent programming
languages similar to Java, and others working on automatic storage management
for persistent and possibly distributed systems