Relational Inheritance Supporting Features Perhaps the most important deficit in Postgresql's current INHERITS model is hard to detect. That deficit is the inability to name inheritance classes themselves. One has to refer to the _per se_ class by referencing an associated table. While it is nice to have anonymous classes, it is highly desirable that classes be first-class objects, and at a minimum a proper object should be able to take a name in its own right.[name] Since a class' interface is the tuple for the associated table, classes should not be directly instantiated unless the class owns (a possibly anonymous) table. As first-class objects, however, one should be able to DECLARE or DEFINE relational inheritance classes (or rather, the root for a tree or branch) [define]. A define function is useful when creating a relational inheritance analog to a pure abstract class; that is, a class that cannot be directly instantiated. (An pure abstract class is not to be confused with an abstract class table.) Actually, lack of DEFINE is a major gap in SQL itself. Most modern programming languages make a distinction between *defining* a “multi-variable precursor” (sort of a light weight type) and *instantiating* a variable. In SQL, definition is always conflated with instantiation using a CREATE expression. It should be possible to DEFINE database objects and then CREATE the objects; create them after an indefinite interval, and perhaps repeatedly. One should not have to repeatedly run verbose scripts. Even more than completing than use in some still theoretical relational inheritance model, DEFINE would be of immediate practical use: CREATE temporary tables, that often occur as clones and that are repeatedly created and dropped. CREATE views that are often dropped then recreated. Creating indexes that are repeatedly dropped. Managing frequently dropped constraints. If one could define a free-floating or “immaculate” index, it could be repeatedly instantiated and attached to columns in different tables. If one could define an immaculate constraint, it could be repeatedly created and attached to tables in a schema. 1.1 Some operations on relational classes. With fully developed relational inheritance it would be highly desirable to re-order columns in descendant tables. The first reason is a strong aesthetic argument. One might, for example, always want some audit columns to be the last five values in any table in a schema. A practical argument is that one could create distinct class signatures by arbitrarily recombining the order of columns. Naturally, it would be convenient, if one could also refer to columns by name in procedural languages. Relational inheritance classes should also support a form of polymorphism. Specifically, the name of an ancestor table (or the table itself) and the fully or sufficiently specified columns of the row's ordered form can be regarded as a distinct signature [signature]. When selecting, inserting, or updating signatures should be polymorphic. Note that an incautious user (or overly permissive implementation, depending on perspective) could create ambiguous queries. These could either be resolved by a tree traversal rule or detected and errored out. Making ambiguous structures or ambiguous queries errors is obviously required by good design. 1.2 Extending relations or classes As I have mentioned elsewhere multiple inheritance requires merging multiple parent classes. Merging multiple ancestries implies a mathematical order of operations. It follows that when an INHERITS clause specifies more than one table, the designer should be able to use parentheses to explicitly specify the order for merging parent tables. Of more general interest is that table attributes have scope. This is a new dimension of complexity that does not exist in pure relational databases. For practical purposes, I anticipate that it will be sufficient only to specify local and subclass scope with phrases using the words LOCAL and CLASS. Columns, constraints, indexes, and rules and triggers (event actions) can have either local or (sub)class scope. Local scope implies that the property (acts as if) it were created an managed independently for the top of the class, and possibly for each descendant. Note that how local properties are treated by inheritance is somewhat ambiguous. Should these dependent objects be created or changed in descendants? Since the answer is not clear I propose that the behavior must be explicitly specified. When a local column, constraint, index, or event action is created, altered, or dropped one must specify whether or not the action cascades to descendants. (It should *not* cascade by default.) Likewise, when a table inherits from a class with local properties one should specify whether local properties are inherited. They should be inherited by default. Class constraints do not suffer from the semantic ambiguity of local constraints. They are always inherited and any changes always apply to descendants. That is what “class scope” means. A class property acts as if there were a single instance of the property for the entire composite relation that is the class. From a practical perspective, the distinction between a local and a class property will normally be invisible to a report writer, except in the case of a unique constraint. The behavior of unique constraints is slightly more complicated than the behavior of other properties with respect to scope. Local unique constraints must be maintained separately for on each table within the class. A (sub)class unique constraint will be maintained as a for the entire class composite relation as a unit. Creating a new class or adding new properties to a class is generally not problematic. One ought to be able to DEFINE a new class either by declaring a class _per se_ or by defining or creating a class aware table. Naturally, it should be possible to create trivial classes consisting of just one table, but it is not possible to do this with the current INHERITS grammar. Obviously, adding a class or table to a class using the INHERITS clause is not problematic. As we consider defining and instantiating, it is worth asking whether there is a place for the anonymous table. In Postgresql a table can be contained in an anonymous class. Symmetry implies that a class should be able to contain an anonymous table. Presumably this would be done by defining a class, omitting the clause that names the class' table, then instantiating the class. Furthermore, one should experience no problems (baring namespace conflicts) when adding columns, indexes, or event actions to a class. Adding constraints, however, can be problematic. This is because they can conflict with existing data either in the table itself or in its descendants [immaculate]. Note that when working with relational inheritance (as with constraint exclusion) one would often want to specify an abstract CHECK constraint by what might be called family or type. For example, all constraints should perhaps be a range of BIGINT of the form A <= B, B < C. Date constraints also often take this form. Though admittedly a somewhat fuzzy concept, it would be useful if one could specify check constraint domains or templates. Obviously, one expects that descendant tables inherit the properties of their parent class. These properties include the columns of the ancestor table, the column order of the ancestor table, ancestor constraints, indexes, and event actions. One currently would override a constraint with ALTER TABLE ... DROP CONSTRAINT. Then, if desired, another constraint could be added. Though effective, using ALTER TABLE to override table properties can be a bit difficult for a human code reader to scan. Readability might be improved if table columns could be declared as explicitly PLURAL or explicitly UNCONSTRAINED. Beyond what one normally thinks of as database constraints, a database implementing relational inheritance model should include several controls over how class properties are inherited. These controls constrain the behavior of descendants. Descendant constraints thus control the behavior of database objects rather than data so they can be thought of as a type of meta-constraint. Scope is a descendant constraint that has already been discussed. In theory one could specify a grammar to scope inheritance to any list of generations. In practice the ability to specify this generation (LOCAL) or all descendants (CLASS) should suffice. Whether an object is abstract or concrete is also a kind of descendant constraint. There are a couple of different interpretations of “ABSTRACT” applicable to relational inheritance. Commonly encountered is the case of a completely specified ancestral template table that should never contain data (even though it could). In another case the object is not fully specified. Indeed, the object definition may be nearly empty. In this case the object cannot contain data. In either event, before data can be inserted into an abstract table (more properly an abstract class) a CONCRETE table must be derived from the parental abstract class' table. Abstraction makes sense for classes, tables, and the table properties of columns, constraints, and event actions. Though hardly nonsense, there may be little practical point in declaring an index abstract. A class may be declared abstract in its own right as it is defined. (Indeed, abstraction implies the separation of definition and instantiation.) A table cannot directly inherit from an class that has been directly declared abstract. Instead it would be necessary to derive another class from the directly abstract class. Note that an inheritance class has one direct property, its table. If its table is abstract then the owning class is indirectly abstract. A table may be directly declared abstract. In addition, if any of a table's columns, constraints, indexes, or event actions are abstract the table is indirectly abstract. It is reasonable to expect that abstract columns, constraints, and perhaps indexes will be abstract as incompletely specified placeholders. An event action could also be abstract as an artifact of an object oriented programming language. In addition, a class or table could be FINAL. Columns, constraints, indexes, and event actions could all in theory control descendant scope. One could, for example, oblige a unique constraint in all descendant tables to have local scope. Similarly, one could provide a feature to lock out overriding a given column, constraint, index, or event action in descendant tables. Likewise, one should be able to explicitly allow, or even require overriding. Constraints (or constraint types), indexes, and event actions could all be forbidden, allowed, or required in descendant tables. Any time a class or is derived from another, the default behavior is that any class property (or sub-property) can be overridden. To derive one class from another in Postgresql one would use the INHERITS clause in the context of a DEFINE or CREATE for a class or table. This is the obvious time to alter table properties. Using ALTER TABLE (and perhaps ALTER CLASS) present another opportunity to override table properties. Indeed, in the case of indexes ALTER may be the only way to override ancestral table properties. At DEFINE or CREATE time I do not see overriding constraints, indexes, or event actions as particularly problematic. Overriding columns could be more problematic, however. To avoid casting up, it may be necessary to require that the type of an overridden column effectively be a domain of its ancestors [domain]. The other notable implication of, for example, constraint overriding, is that name spaces become ramified. If class-table bar inherits from class-table foo and foo's constraint acme, then when bar overrides constraint acme as acme' we need to distinguish between the two acmes with identical local names. We need foo'acme and foo'bar'acme. Postgresql does not currently support ramified name spaces. Modifying extant properties of existing classes (that is, using ALTER), is more problematic than creating a property or overriding a property when a class or table is first defined. Changing a column produces the same problem that it would for a single table, except that the data incompatibility problem when changing to a more restrictive type extends to the entire composite table and not just the root table itself. Adding or changing constraints also have the problem of being more restrictive than existing data in the composite table. In addition, if the table is in a descendant class, changing a child property (especially a constraint) could conflict with an ancestor's descendant constraint. To DROP classes or class properties is even more involved than altering a class. Dropping an entire class from the network might or might not be problematic depending on its location. When a class is a leaf, it can always be dropped. When the class is an intermediate node there are at least four options options. Deleting the class can be forbidden, the entire subtree can be pruned, all the subtrees formerly rooted in the deleted node can be freed as stand-alone classes, or one can try to merge any orphan subtrees into the existing tree. Two approaches suggest themselves as merge strategies. The deleted node could be replaced with a completely empty and anonymous class. Alternatively, one could try to make the orphan subtrees inherit directly from the deleted class' parent(s). When the deleted node is the top node (and not also a leaf node) ones options are similar to those for intermediate nodes, except that the option of grafting orphans to their grandparent(s) is unavailable. Note that in the model suggested here it is suggested that dropping a class' table should be distinct from dropping the class _per se_. DROP TABLE ought to result in a completely abstract class node. Deleting a table in a leaf node is no problem. When other objects depend on the deleted table the action the relevant actions are to either forbid the DROP TABLE or merge descendant class-tables with the now empty, abstract class. As a side note, TRUNCATE also would need versions for single tables and composite tables. Dropping columns, constraints, indexes, or event actions can all get tricky when it comes to scope. In particular, one would want a way to say “drop this column even when it has been overridden.” Furthermore, dropping a property that was REQUIRED by an ancestor table would cause the attempt to delete the property to fail. 1.3 Relational X Network (Would relational inheritance need support outside SQL) In Postgresql at present ONLY is the only tool for navigating an inheritance net. I have already suggested that much more sophisticated tools for naming class nodes will be *needed*. Many more, along the lines of XPath, would no doubt be desirable. The problem of insufficient network navigation tools pales compared to the related problem of whether users would be satisfied with SQL as the query tool for the network organization that would be part of a database with relational inheritance. Would users feel the need for a network oriented query language like XQuery? 1.4 Relations other than tables There is also the question of whether relations other than tables might benefit from features of relational inheritance. In Postgresql all of the following can be regarded as relations: tables (of various flavors), indexes, foreign constraints, unique constraints, views, and relational classes. One areas of particular interest is the degree to which defined but uninstantiated relations should be treated the same as instantiated relations. Application of relational inheritance to views would also be of great interest once the problem of relational inheritance and tables was in hand. 2 Conclusion The most important point made in this essay is that some operations would be *very* useful indeed for database design in general even if there were no provision for relational inheritance. First, relational databases should separate definition from instantiation. In particular Postgresql should adopt a “DEFINE object definition_name / CREATE object_name FROM definition_name” feature in advance of its competitors. The ability to define objects will come in very handy for temporary tables, creating variations on a table theme, and so on. It will also eliminate the need to repeatedly run verbose SQL scripts. The second insight of general interest is that it should be possible to recycle constraint and index declarations. Furthermore, it should be possible to create immaculate definitions of constraints and indexes that are not attached to any particular table or view. Immaculate properties would eliminate the need to repeatedly script nearly identical constraints and indexes across large numbers of relations in a schema. For features germane only to classes the most important observation is that relational inheritance classes should be first class database objects. Most important, classes should be able to be named. Also it should be possible to directly declare classes. Developers should be able to create empty classes. Even more important, they should be able to create a class with only one table [multiple]. In addition, any reasonably complete implementation would need to allow for local and class scope, descendant constraints and cascade controls for ALTER and DROP. Even more important is the question of what is to be gained by relational inheritance. I have run across design problems where relational inheritance with sub-classing primary keys would have been *very* convenient. One imagines, however, that from the perspective of a database designer trained in relational design, the hierarchical network dimension of relational inheritance will be “abused”. (Of course, “abuse” would be a matter of perspective.) Not having much personal experience, my arm-chair expectation for current practice is that OO application and relational database schemata are developed in separately. Moving data from the application to the database thus requires a translation layer. (The translation layer usually being _ad hoc_ and distributed throughout the application rather than being a discrete module.) Report writers, however, can directly query the “well designed” relational schema. With the object oriented potential of relational inheritance, I would expect to see a development pattern where an object oriented program is translated into an object oriented schema. One could then build a translation layer of views allowing indirect query of the underlying data, but what one really expects is that maintenance database developers and report writers will have to directly query an object oriented network database using SQL on an _ad hoc_ basis. My own strong suspicion is that with relational inheritance many object oriented programmers will design their own schemata. Or, in what will amount to the same thing, schemata will be developed based directly on object oriented analyses. Another likely scenario is that when a project gets behind, relational schemata get thrown out to speed application development. The net effect will be to simply move complexity downstream rather than dealing with it early on. Object oriented programmers will have a trivial problem putting data into databases slavishly parallel to object oriented data structures. At the other end of the development food chain, report writers will have a devil of a time writing SQL to extract data from what will have effectively become network databases. ------ [name]: Perhaps one could add a clause to CREATE TABLE like [CLASS [NAME] [IS] _class_name_]. [define]: One would choose DECLARE, DEFINE, or some other phrase based on what has not been reserved by the current SQL standard. Hereafter, I use “define” and “DEFINE”. [signature]: For those who prefer safety to power, a signature could be based on the fully or sufficiently specified *unordered* columns of a row and an ancestor class or table name. [immaculate]: It seems a shame that constraint names can only be used to indicate what constraint one wants to drop. Where there is a name, there is an invitation for reuse. In all database schemata I have known check constraints and foreign key constraints (and indexed columns) are repeatedly declared with all but identical lengthy SQL expressions. When a constraint is first declared in-line how much better it would be if the initial form could be reused. An even better solution would be the ability to define the form of a constraint (or index) without reference to a concrete table. I think of this as an “immaculate” definition of a normally dependent object. One could then instantiate multiple objects based on the immaculate definition everywhere the otherwise tediously repetitive constraint declaration would be needed. This conforms to the best practice of storing information once and reusing it many times. It should cut down on errors and reduce maintenance overhead. [domain]: As an example of overriding with domains BIGINT could beget BIGINT that begets INTEGER that begets the user defined domain of whole_integer (where whole_integer is not to be confused with unsigned two octet integer). [multiple]: For the sake of simplicity I have assumed that each class contained either zero or one tables. Not discussed in this paper is the possibility that a class could hold multiple tables (or mix of tables, views, relational inheritance classes, ...). For example, class foo contains tables A, B, and C. Class bar inherits from foo and thus contains A', B', and C'.