Department of Computer Science
New Mexico Institute of Mining and Technology
Speare Hall
801 Leroy Place
Socorro, NM 87801
Phone: (505) 835-5288
Fax : (505) 835-5587
Email: mazumdar@cs.nmt.edu
This work examines why it is sometimes possible to transform a global constraint that would appear to require a great deal of communication and synchronization, into a set of relatively simple local constraints that require little, if any, coordination among databases. There is both the theoretical question regarding what types of constraints can be efficiently maintained by local operations and the practical question of how to make a tool that will generate necessary local integrity checks that are dynamically adjustable. Our approach towards relaxing constraint checks is to design special predicates that capture the various notions of relaxation of constraints and then use them to derive local checks which will tolerate just the amount of violation prespecified by the relaxation.
The first result of our work is the identification of a class of types of constraints which can be efficiently maintained by local operations. This class includes: (a) orderings: total orders, partial orders, n-ary predicates involving a generalized notion of transitivity, (b) equivalence relations including referential integrity and disjoint sets, (c) polynomial inequalities with real coefficients. and (d) convex bodies and their complements. We also have a preliminary design for delayed checking of equivalence relation based constraints.
In order to address the practical aspects of our research, we have procured equipment, installed and tested software infrastructure, designed the system to perform reformulation of constraints, i.e., computation of the localized constraints and their dynamic adjustments resulting from updates of data and nodes leaving or crashing. This has enabled us to design and build an experimental testbed for testing our designed scheme for numeric constraints and for comparing the performance against a straightforward integrity maintenance scheme. One part includes a tool that generates necessary local integrity checks that are dynamically adjustable. Another part includes a simulation of local updates of data, constraint checking, and constraint adjustment using the above tool. Currently we are adding a third part for visualization.
(Q_1 x in r_1) (Q_2 y in r_2) p(x, y)
where p is a binary predicate, Q_i is either a universal (V) or existential (E) quantifier and r_i resides in node i. In vertical fragmentation, every tuple of the original table is stored at every node, but in tables that contain only a subset of the original attributes (usually the key attributes are present in every node). In this case, again considering a 2-node setting, an original constraint of the form(Vt in r) p(t.A, t.B)
where A and B are attributes of r becomes a distributed constraint when node 1 contains the A attribute and node 2 contains the B attribute and both contain a key attribute K:(Vx in r_1) (V y \in r_2) [(x.K = y.K) --> p(x.A, y.B)]
We have interesting solutions for both of these forms for a significant class of constraints.For the first form, we have solved the problem for constraints involving orderings (total and partial orders); we have a generalized notion of transitivity for n-ary predicates underlying these solutions. We also have a solution involving equivalence relation predicates. The dynamic adjustment of the local conditions at run-time depending on actual data works out well for both kinds of predicates.
An interesting solution for the vertical fragmentation case obtained by exploiting the fact that there is exactly one universal quantifier in the original constraint. We break up the constraint C into a set C_s = {C_1, C_2, ...} of possibly overlapping sufficient local constraints. Thus, instead of storing attributes (K, A) in node 1 and (K, B) in node 2 (where K is the key attribute) as a result of vertical fragmentation, we also store the constraint satisfied by each data point, i.e., we store (K,A,C\#) and (K,B,C\#) respectively, where the domain of C\# is the set C_s.
Another interesting development is our use of a geometric interpretation of a large class of inequality constraints. Consider, for example, a constraint of the form x^2 + y^2 < c^2. Geometrically, in two dimensions (the idea generalizes to a n-dimensions), the constraint is satisfied by all (x,y) points that fall inside the circle. Now any enclosed rectangle -a < x < a, -b < y < b represents a localizable sufficient condition since the condition -a < x < a is a unary predicate in x and -b < y < b in y. This blends synergistically with the idea of storing the constraint with the data because once the x component of a data tuple is stored in one site and the y component is stored in another, the unary predicate that they should satisfy are also stored; thus updates to the x (or the y) component that satisfy the unary predicate in x (or y), and hence keep the data point within the rectangle, need no communication. In addition to a top-down approach for certain predicates, we have developed a bottom-up approach based on an algorithm to find rectangles dynamically based on the actual data.
The long-range results of this work is that it will allow nontrivial interdatabase constraints to be maintained efficiently in spite of site autonomy. The current state of the art relies on application code for constraint preservation; such code is difficult to understand, maintain, and upgrade. For long, researchers have noted the need of energetic integrity design and the desirability of appropriate tools being available to the designer. Our tool will be of value to users of integrated databases in many domains, even to those who insist on writing application code to maintain interdatabase constraints and the software we are developing will be publicly available. In addition, this work will also impact on the support of disconnected operations in distributed systems; more specifically, mobile and wireless environments can use our work directly.
The project has enriched our infrastructure for research and teaching. It has involved two graduate students; one has completed a MS thesis and the other is a Ph.D. student currently working on the project. It has also involved one undergraduate student who is nearing completion of his studies. Regarding Curriculum development, a graduate seminar CS~589 Recent Topics in Databases have exposed students to this project within the general context of database integrity maintenance. It has also been included in the graduate database course CS~573.
S. Mazumdar. Optimizing Distributed Integrity Constraints. In Proceedings of the Third International Symposium on Database Systems for Advanced Applications (DASFAA-93). Taejon, Korea, 1993. pages 327--334.
Z. Yuan. Dynamic Localization of Global constraints in Distributed Databases. Masters Thesis. Department of Computer Science. New Mexico Tech. September 1997.
S. Mazumdar and Z. Yuan. Localizing Global Constraints: A Geometric Approach. In Proceedings of the 9th International Conference on Computing and Information. ICCI'98. To appear.
S. Mazumdar, R. Cline, and M. Pietrzyk. An Experimental Testbed for Constraint Localization. Technical Report. Department of Computer Science. New Mexico Tech. 98/3/402. March 1998.
Integrity is specified through a set of integrity constraints, which are assertions or predicates that each legal or consistent database state must satisfy. The simplest way of assuring integrity is of course to include the constraint as a test at the end of every transaction. However, the integrity constraint is usually a non-trivial boolean query whose evaluation may be costly in terms of execution time, and repeating these queries after every transaction implies a large overhead for transaction processing. It is easy to see that this method can be impractically expensive. Distributed databases exacerbate the problem of constraint maintenance. Relations are decomposed (by horizontal and vertical fragmentation) into fragments, replicated, and distributed. In this environment, an integrity constraint must be translated into a set of constraints on those fragments and copies, thus increasing the number of constraints to maintain. The naive method of checking constraints after each transaction is far worse in the distributed case since the checking would typically require data transfer as well as computation. Further, the penalty for allowing a transaction to execute with the intention of aborting it at commit time in the event of constraint violation is more severe since rollback and recovery must occur at all the sites in which the transaction participated.
User defined triggers are useful in integrity maintenance since they can check constraints only for those operations in a transaction that can potentially violate a constraint. However, specifying triggers can be an error-prone activity as their number and scope for interaction increases. The most useful strategy for many integrity constraints is their simplification into run-time checks. A long time ago, Nicolas urged researchers to concentrate on the problem of finding good constraints. But the effort in this direction has been very small compared with, say, the problem of choosing good relation forms.
J. M. Nicolas. Logic for Improving Integrity Checking in Relational Databases. Acta Informatica, 18(3):227--254, December 1982.
P. A. Bernstein, B. T. Blaustein, and E. M. Clarke. Fast Maintenance of Semantic Integrity Assertions Using Redundant Aggregate Data. In Proceedings of the Sixth International Conference on Very Large Databases, pages 126--136, 1980.
L. Henschen, W. McCune, and S. Naqvi. Compiling Constraint Checking Programs from First-Order Formulas. In H. Gallaire, J. Minker, and J. Nicolas, editors, Advances in Database Theory, volume 2, pages 145--169. Plenum Press, New York, 1984.
T. Sheard and D. Stemple. Automatic Verification of Database Transaction Safety. ACM Transactions on Database Systems, 12(3):322--368, September 1989.
Mobile and wireless environments.
Database Security.