Query Optimization Engineering
David Maier*
Leonard Shapiro**
Contact Information
Department of Computer Science and Engineering*
Oregon Graduate Institute
P.O. Box 91000
Portland, OR 97921-1000
Phone: (503) 690-1154
FAX: (503) 690-1553
Email: maier@cse.ogi.edu
Computer Science Department**
Portland State University
P.O. Box 751
Portland, OR 97201-0751
Phone: (503)789-3081
FAX: (503)725-3211
Email: len@cs.pdx.edu
WWW PAGE
http://www.cs.pdx.edu/~len/columbia.html
Keywords
query, processing, optimization, benchmarks, performance, search,
models, operators
Project Award Information
Duration: Three years
Current year: first
Project Name: Query Optimization Engineering
This project consists of two awards, one to the Oregon Graduate
Institute with David Maier as Principal Investigator, and one to
Portland State University with Leonard Shapiro as Principal Investigator.
Project Summary
This project will explore techniques for optimizing database queries,
primarily queries from new application areas.
It will focus on engineering issues such as
- Which logical operators and rule sets work best for a particular query class?
- How well do different search strategies work with a given query model?
- What are the tradeoffs, relative to optimization time and plan quality, of search heuristics and limits on plan shape?
- How can optimizers handle variability in the runtime evaluation environment?
Goals, Objectives, and Targeted Activities
Documents referred to here are available on the project's web page. Our
proposal lists the first goal for year 1 as the development of operators
and rules for TPC-D queries. Completion of this goal is described in
the thesis of Keith Billings. Our second goal was to develop operators
and cost models for OQL. Quan Wang developed a cost model for OQL queries
that has been verified against the GemStone OODBMS, which takes into
account clustering and ordering information on objects. He is currently
developing and testing new operators for supporting nested data and nested
queries in OQL. Our third goal was to begin defining logical operators
for OLAP queries, which we have not as yet begun.
Our objectives for the coming year are:
1. Further testing of safe search-pruning strategies using Columbia, to
characterize their effectiveness with different query topologies and
database statistics.
2. Begin testing heuristic search-pruning strategies using Columbia, in
particular ones that are bounded: the found plan cost is withing a bounded
amount of the optimal plan.
3. Construct a cost-based optimizer for OQL using the Columbia framework.
4. Consider novel optimization strategies for TPC-D queries.
5. Development of optimization techniques for OLAP queries.
Indication of Success
An important tool we are using in these investigations, namely the
Cascades Query Optimization Framework, has been improved significantly in
performance and ease of use. This work is described in the thesis of
Yongwen Xu. We have been able to demonstrate the superior performance,
when measured by number of expressions generated, of top-down to
bottom-up optimizers. That work is described in a paper, "Group Pruning
in the Columbia Query Optimizer", submitted for publication.
As described here and in the previous section, the stated objectives
of the project have been accomplished thus far.
Project Impact and Outcome
- Human Resources
This grant has supported the work of graduate students Quan
Wang, Keith Billings and Yongwen Xu.
Quan Wang is progressing in his PhD program. He used his work on cost
estimation
to satisify is research proficiency exam, and is currently completing a thesis
proposal on the nested query work.
Keith is now working fulltime for
Informix's Portland development office and Yongwen is about to take a
position with Oracle's Portland development office. Keith and Yongwen
have recently received their MS degrees. Keith has continued to be
involved in the project since he graduated, advising Yongwen and
participating in project discussions. Yongwen wishes to stay similarly
involved with the project. These continued collaborations will not only help
the project, but will build bridges between these two major database companies
and our project.
- Education and Curriculum Deveelopment at all levels.
This grant has spawned a series of speakers in Portland, Oregon, on the
topic of query optimization. The series was funded by a local Oregon
program, and has been rated as the best attended series in recent
memory. Matriculated students and local citizenry attend the talks, thus
enriching both traditional education and continuing education.
- Department/institution infrastructure
Because Shapiro and Maier are at different institutions, the project
encourages collaboration between the institutions. The PIs and their
students meet weekly at OGI. Shapiro has an office at OGI. Because of
these interactions, Shapiro and Maier are much more aware of activities
at the other institutions and can serve as channels of information
between PSU and OGI. The end result is that the two institutions form
more of a critical mass in computer science.
- Industry -- collaborations, transfer of technology, patents.
To ensure that our work is relevant and does not duplicate current
art, we have been working with engineers from database companies in the
Portland-Seattle area, specifically
- Goetz Graefe and Surajit Chaudhuri, Microsoft
- Gary Kelly and Dave Clay, Oracle
- Jay Almarode, GemStone Systems, Inc.
- Seckin Unlu, Intel
- Bill McKenna, Red Brick Systems
- Pedro Cellis, Tandem Computers
who have promised to remain available to us during the current project.
These collaborators are valuable both in identifying critical problems
in commercial optimization and for evaluating the techniques we
develop. They also provide a conduit for our best results to influence
the design of commercial products.
Project References
Our work builds on lessons learned in the EREQ (DARPA) and Revelation
(NSF) projects, which are included in references below.
Several papers are available on the project web page.
S. Daniels, G. Graefe, T. Keller, D. Maier, D. Schmidt and B. Vance.
Query Optimization in Revelation, an Overview. IEEE Data Engineering.
Bulletin, June 1991.
L. Fegaras and D. Maier. Towards an Effective Calculus for Object Query
Languages. ACM-SIGMOD International Conference on Management of Data,
San Jose, May 1995.
G. Graefe, A. Linville and L. D. Shapiro. Sort versus Hash Revisited.
IEEE Trans. on Knowledge and Data Eng. December 1994.
G. Graefe, D. Maier, L. Shapiro, S. Daniels, T. Keller and B. Vance.
Issues in Distributed Object Assembly. In Distributed Object Management,
M.T. Ozsu, U. Dayal and P. Valduriez, editors, Morgan Kaufmann, 1994.
D. Maier, S. Daniels, T. Keller, B. Vance, G. Graefe and W. McKenna.
Challenges for Query Processing in Object-Oriented Databases. In Query
Processing for Advanced Database Applications, J. C. Freytag, G. Vossen
and D. Maier, editors, Morgan Kaufmann, 1994.
B. Vance and D. Maier, Rapid Bushy Join-order Optimization with Cartesian
Products, Proc. ACM SIGMOD Conf 1996.
Area Background
Query optimizers are one of the main means by which modern database
systems achieve their performance advantages. Given a request for data
manipulation or retrieval, an optimizer will choose an optimal plan for
evaluating the request from among the manifold alternative strategies.
Optimizers for commercial relational database systems appeared early in
the last decade, and optimization for the basic relational model was
considered a solved problem by many. However, new interest in query
capabilities for knowledge discovery and on-line analytical processing,
in large data warehouses, and against complex multimedia objects has
kindled renewed research in optimization. Current optimizers have often
proved inadequate to the needs of these new application areas.
The primary goal of this project is to answer several engineering
questions related to query optimization for new application areas.
These engineering questions are listed above under Project Summary.
Area References
J. A. Blakeley, W. J. McKenna and G. Graefe, Experiences Building
the Open OODB Query Optimizer, Proc. ACM SIGMOD Conf.,
Washington, D C, May 1993, 287.
P. Cellis, The Query Optimizer in Tandem's ServerWare SQL Product,
Proceedings of VLDB '96, Page 592.
G. Graefe, The Cascades Framework for Query Optimization, Bulletin of
the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg 19-29
G. Graefe. Query Evaluation Techniques for Large Databases. ACM
Computing Surveys June 1993.
G. Graefe. Volcano, An Extensible and Parallel Dataflow Query Processing
System. IEEE Trans. on Knowledge and Data Eng. February 1994.
G. Graefe, The Cascades Framework for Query Optimization,
Bulletin of the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg
19-29
G. Graefe and W. J. McKenna, The Volcano Optimizer Generator:
Extensibility and Efficient Search, Proc. IEEE Int'l. Conf. on
Data Eng., Vienna, Austria, April 1993, 209.
N. Kabra, D. DeWitt, OPT++: An Object-Oriented Implementation
for Extensible Database Query Optimization, to appear, VLDB Journal.
S. Zdonik and D. Maier, Readings in Object-Oriented Database
Systems, Morgan Kaufmann, San Mateo, CA, 1990.