Query Optimization Engineering

David Maier*
Leonard Shapiro**

Contact Information

Department of Computer Science and Engineering*
Oregon Graduate Institute
P.O. Box 91000
Portland, OR 97921-1000
Phone: (503) 690-1154
FAX: (503) 690-1553
Email: maier@cse.ogi.edu

Computer Science Department**
Portland State University
P.O. Box 751
Portland, OR 97201-0751
Phone: (503)789-3081
FAX: (503)725-3211
Email: len@cs.pdx.edu

WWW PAGE

http://www.cs.pdx.edu/~len/columbia.html

Keywords

query, processing, optimization, benchmarks, performance, search, models, operators

Project Award Information

Duration: Three years
Current year: first
Project Name: Query Optimization Engineering
This project consists of two awards, one to the Oregon Graduate Institute with David Maier as Principal Investigator, and one to Portland State University with Leonard Shapiro as Principal Investigator.

Project Summary

This project will explore techniques for optimizing database queries, primarily queries from new application areas. It will focus on engineering issues such as

Goals, Objectives, and Targeted Activities

Documents referred to here are available on the project's web page. Our proposal lists the first goal for year 1 as the development of operators and rules for TPC-D queries. Completion of this goal is described in the thesis of Keith Billings. Our second goal was to develop operators and cost models for OQL. Quan Wang developed a cost model for OQL queries that has been verified against the GemStone OODBMS, which takes into account clustering and ordering information on objects. He is currently developing and testing new operators for supporting nested data and nested queries in OQL. Our third goal was to begin defining logical operators for OLAP queries, which we have not as yet begun.
Our objectives for the coming year are:
1. Further testing of safe search-pruning strategies using Columbia, to characterize their effectiveness with different query topologies and database statistics.
2. Begin testing heuristic search-pruning strategies using Columbia, in particular ones that are bounded: the found plan cost is withing a bounded amount of the optimal plan.
3. Construct a cost-based optimizer for OQL using the Columbia framework.
4. Consider novel optimization strategies for TPC-D queries.
5. Development of optimization techniques for OLAP queries.

Indication of Success

An important tool we are using in these investigations, namely the Cascades Query Optimization Framework, has been improved significantly in performance and ease of use. This work is described in the thesis of Yongwen Xu. We have been able to demonstrate the superior performance, when measured by number of expressions generated, of top-down to bottom-up optimizers. That work is described in a paper, "Group Pruning in the Columbia Query Optimizer", submitted for publication. As described here and in the previous section, the stated objectives of the project have been accomplished thus far.

Project Impact and Outcome

Project References

Our work builds on lessons learned in the EREQ (DARPA) and Revelation (NSF) projects, which are included in references below.

Several papers are available on the project web page.

S. Daniels, G. Graefe, T. Keller, D. Maier, D. Schmidt and B. Vance. Query Optimization in Revelation, an Overview. IEEE Data Engineering. Bulletin, June 1991.

L. Fegaras and D. Maier. Towards an Effective Calculus for Object Query Languages. ACM-SIGMOD International Conference on Management of Data, San Jose, May 1995.

G. Graefe, A. Linville and L. D. Shapiro. Sort versus Hash Revisited. IEEE Trans. on Knowledge and Data Eng. December 1994.

G. Graefe, D. Maier, L. Shapiro, S. Daniels, T. Keller and B. Vance. Issues in Distributed Object Assembly. In Distributed Object Management, M.T. Ozsu, U. Dayal and P. Valduriez, editors, Morgan Kaufmann, 1994.

D. Maier, S. Daniels, T. Keller, B. Vance, G. Graefe and W. McKenna. Challenges for Query Processing in Object-Oriented Databases. In Query Processing for Advanced Database Applications, J. C. Freytag, G. Vossen and D. Maier, editors, Morgan Kaufmann, 1994.

B. Vance and D. Maier, Rapid Bushy Join-order Optimization with Cartesian Products, Proc. ACM SIGMOD Conf 1996.

Area Background

Query optimizers are one of the main means by which modern database systems achieve their performance advantages. Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the manifold alternative strategies. Optimizers for commercial relational database systems appeared early in the last decade, and optimization for the basic relational model was considered a solved problem by many. However, new interest in query capabilities for knowledge discovery and on-line analytical processing, in large data warehouses, and against complex multimedia objects has kindled renewed research in optimization. Current optimizers have often proved inadequate to the needs of these new application areas. The primary goal of this project is to answer several engineering questions related to query optimization for new application areas. These engineering questions are listed above under Project Summary.

Area References

J. A. Blakeley, W. J. McKenna and G. Graefe, Experiences Building the Open OODB Query Optimizer, Proc. ACM SIGMOD Conf., Washington, D C, May 1993, 287.

P. Cellis, The Query Optimizer in Tandem's ServerWare SQL Product, Proceedings of VLDB '96, Page 592.

G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg 19-29

G. Graefe. Query Evaluation Techniques for Large Databases. ACM Computing Surveys June 1993.

G. Graefe. Volcano, An Extensible and Parallel Dataflow Query Processing System. IEEE Trans. on Knowledge and Data Eng. February 1994.

G. Graefe, The Cascades Framework for Query Optimization, Bulletin of the TC on Data Engineering, Vol 18 No. 3, September 1995, Pg 19-29

G. Graefe and W. J. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search, Proc. IEEE Int'l. Conf. on Data Eng., Vienna, Austria, April 1993, 209.

N. Kabra, D. DeWitt, OPT++: An Object-Oriented Implementation for Extensible Database Query Optimization, to appear, VLDB Journal.

S. Zdonik and D. Maier, Readings in Object-Oriented Database Systems, Morgan Kaufmann, San Mateo, CA, 1990.