Relational Inheritance Features.

Trent Shipley <tshipley@xxxxxxxx> · Fri, 6 Jan 2006 10:31:49 -0700

Relational Inheritance Supporting Features

Perhaps the most important deficit in Postgresql's current INHERITS model is 
hard to detect.  That deficit is the inability to name inheritance classes 
themselves.  One has to refer to the _per se_ class by referencing an 
associated table.  While it is nice to have anonymous classes, it is highly 
desirable that classes be first-class objects, and at a minimum a proper 
object should be able to take a name in its own right.[name]

Since a class' interface is the tuple for the associated table, classes should 
not be directly instantiated unless the class owns (a possibly anonymous) 
table.  As first-class objects, however, one should be able to DECLARE or 
DEFINE relational inheritance classes (or rather, the root for a tree or 
branch)  [define].    A define function is useful when creating a relational 
inheritance analog to a pure abstract class; that is, a class that cannot be 
directly instantiated. (An pure abstract class is not to be confused with an 
abstract class table.)

Actually, lack of DEFINE is a major gap in SQL itself.  Most modern 
programming languages make a distinction between *defining* a “multi-variable 
precursor” (sort of a light weight type) and *instantiating* a variable.  In 
SQL, definition is always conflated with instantiation using a CREATE 
expression.  It should be possible to DEFINE database objects and then CREATE 
the objects; create them after an indefinite interval, and perhaps 
repeatedly.  One should not have to repeatedly run verbose scripts.

Even more than completing than use in some still theoretical relational 
inheritance model, DEFINE would be of immediate practical use:

CREATE temporary tables, that often occur as clones and that are repeatedly 
created and dropped.
CREATE views that are often dropped then recreated.
Creating indexes that are repeatedly dropped.
Managing frequently dropped constraints.
If one could define a free-floating or “immaculate” index, it could be 
repeatedly instantiated and attached to columns in different tables.
If one could define an immaculate constraint, it could be repeatedly created 
and attached to tables in a schema.

 1.1 Some operations on relational classes.

With fully developed relational inheritance it would be highly desirable to 
re-order columns in descendant tables.  The first reason is a strong 
aesthetic argument.  One might, for example, always want some audit columns 
to be the last five values in any table in a schema.  A practical argument is 
that one could create distinct class signatures by arbitrarily recombining 
the order of columns.  Naturally, it would be convenient, if one could also 
refer to columns by name in procedural languages.

Relational inheritance classes should also support a form of polymorphism.  
Specifically, the name of an ancestor table (or the table itself) and the 
fully or sufficiently specified columns of the row's ordered form can be 
regarded as a distinct signature [signature].  When selecting, inserting, or 
updating signatures should be polymorphic.  Note that an incautious user (or 
overly permissive implementation, depending on perspective) could create 
ambiguous queries.  These could either be resolved by a tree traversal rule 
or detected and errored out.  Making ambiguous structures or ambiguous 
queries errors is obviously required by good design.

 1.2 Extending relations or classes

As I have mentioned elsewhere multiple inheritance requires merging multiple 
parent classes.  Merging multiple ancestries implies a mathematical order of 
operations.  It follows that when an INHERITS clause specifies more than one 
table, the designer should be able to use parentheses to explicitly specify 
the order for merging parent tables.

Of more general interest is that table attributes have scope.  This is a new 
dimension of complexity that does not exist in pure relational databases.  
For practical purposes, I anticipate that it will be sufficient only to 
specify local and subclass scope with phrases using the words LOCAL and 
CLASS.  Columns, constraints, indexes, and rules and triggers (event actions) 
can have either local or (sub)class scope.  

Local scope implies that the property (acts as if) it were created an managed 
independently for the top of the class, and possibly for each descendant.  
Note that how local properties are treated by inheritance is somewhat 
ambiguous.  Should these dependent objects be created or changed in 
descendants?  Since the answer is not clear I propose that the behavior must 
be explicitly specified.  When a local column, constraint, index, or event 
action is created, altered, or dropped one must specify whether or not the 
action cascades to descendants. (It should *not* cascade by default.)  
Likewise, when a table inherits from a class with local properties one should 
specify whether local properties are inherited.  They should be inherited by 
default.

Class constraints do not suffer from the semantic ambiguity of local 
constraints.  They are always inherited and any changes always apply to 
descendants.  That is what “class scope” means.  A class property acts as if 
there were a single instance of the property for the entire composite 
relation that is the class.

From a practical perspective, the distinction between a local and a class 
property will normally be invisible to a report writer, except in the case of 
a unique constraint.  The behavior of unique constraints is slightly more 
complicated than the behavior of other properties with respect to scope.  
Local unique constraints must be maintained separately for on each table 
within the class.  A (sub)class unique constraint will be maintained as a for 
the entire class composite relation as a unit. 

Creating a new class or adding new properties to a class is generally not 
problematic.  One ought to be able to DEFINE a new class either by declaring 
a class _per se_ or by defining or creating a class aware table.  Naturally, 
it should be possible to create trivial classes consisting of just one table, 
but it is not possible to do this with the current INHERITS grammar.

Obviously, adding a class or table to a class using the INHERITS clause is not 
problematic.  

As we consider defining and instantiating, it is worth asking whether there is 
a place for the anonymous table.  In Postgresql a table can be contained in 
an anonymous class.  Symmetry implies that a class should be able to contain 
an anonymous table.  Presumably this would be done by defining a class, 
omitting the clause that names the class' table, then instantiating the 
class. 

Furthermore, one should experience no problems (baring namespace conflicts) 
when adding columns, indexes, or event actions to a class.  

Adding constraints, however, can be problematic.  This is because they can 
conflict with existing data either in the table itself or in its descendants 
[immaculate].  

Note that when working with relational inheritance (as with constraint 
exclusion) one would often want to specify an abstract CHECK constraint by 
what might be called family or type.  For example, all constraints should 
perhaps be a range of BIGINT of the form A <= B, B < C.  Date constraints 
also often take this form.  Though admittedly a somewhat fuzzy concept, it 
would be useful if one could specify check constraint domains or templates.

Obviously, one expects that descendant tables inherit the properties of their 
parent class.  These properties include the columns of the ancestor table, 
the column order of the ancestor table, ancestor constraints, indexes, and 
event actions.  One currently would override a constraint with ALTER 
TABLE ... DROP CONSTRAINT.  Then, if desired, another constraint could be 
added.  Though effective, using ALTER TABLE to override table properties can 
be a bit difficult for a human code reader to scan.  Readability might be 
improved if table columns could be declared as explicitly PLURAL or 
explicitly UNCONSTRAINED.

Beyond what one normally thinks of as database constraints, a database 
implementing relational inheritance model should include several controls 
over how class properties are inherited.  These controls constrain the 
behavior of descendants.  Descendant constraints thus control the behavior of 
database objects rather than data so they can be thought of as a type of 
meta-constraint.  

Scope is a descendant constraint that has already been discussed.  In theory 
one could specify a grammar to scope inheritance to any list of generations.  
In practice the ability to specify this generation (LOCAL) or all descendants 
(CLASS) should suffice.

Whether an object is abstract or concrete is also a kind of descendant 
constraint.  There are a couple of different interpretations of “ABSTRACT” 
applicable to relational inheritance.  Commonly encountered is the case of a 
completely specified ancestral template table that should never contain data 
(even though it could).  In another case the object is not fully specified.  
Indeed, the object definition may be nearly empty.  In this case the object 
cannot contain data.  In either event, before data can be inserted into an 
abstract table (more properly an abstract class) a CONCRETE table must be 
derived from the parental abstract class' table.

Abstraction makes sense for classes, tables, and the table properties of 
columns, constraints, and
event actions.  Though hardly nonsense, there may be little practical point in 
declaring an index abstract.  A class may be declared abstract in its own 
right as it is defined.  (Indeed, abstraction implies the separation of 
definition and instantiation.)  A table cannot directly inherit from an class 
that has been directly declared abstract.  Instead it would be necessary to 
derive another class from the directly abstract class.  Note that an 
inheritance class has one direct property, its table.  If its table is 
abstract then the owning class is indirectly abstract.

A table may be directly declared abstract.  In addition, if any of a table's 
columns, constraints, indexes, or event actions are abstract the table is 
indirectly abstract.  It is reasonable to expect that abstract columns, 
constraints, and perhaps indexes will be abstract as incompletely specified 
placeholders.  An event action could also be abstract as an artifact of an 
object oriented programming language.

In addition, a class or table could be FINAL.

Columns, constraints, indexes, and event actions could all in theory control 
descendant scope.  One could, for example, oblige a unique constraint in all 
descendant tables to have local scope.  Similarly, one could provide a 
feature to lock out overriding a given column, constraint, index, or event 
action in descendant tables.  Likewise, one should be able to explicitly 
allow, or even require overriding.

Constraints (or constraint types), indexes, and event actions could all be 
forbidden, allowed, or required in descendant tables.  

Any time a class or is derived from another, the default behavior is that any 
class property (or sub-property) can be overridden.  To derive one class from 
another in Postgresql one would use the INHERITS clause in the context of a 
DEFINE or CREATE for a class or table.  This is the obvious time to alter 
table properties.  

Using ALTER TABLE (and perhaps ALTER CLASS) present another opportunity to 
override table properties.  Indeed, in the case of indexes ALTER may be the 
only way to override ancestral table properties.

At DEFINE or CREATE time I do not see overriding constraints, indexes, or 
event actions as particularly problematic.  Overriding columns could be more 
problematic, however.  To avoid casting up, it may be necessary to require 
that the type of an overridden column effectively be a domain of its 
ancestors [domain].

The other notable implication of, for example, constraint overriding, is that 
name spaces become ramified.  If class-table bar inherits from class-table 
foo and foo's constraint acme,  then when bar overrides constraint acme as 
acme' we need to distinguish between the two acmes with identical local 
names.  We need foo'acme and foo'bar'acme.  Postgresql does not currently 
support ramified name spaces.  

Modifying extant properties of existing classes (that is, using ALTER), is 
more problematic than creating a property or overriding a property when a 
class or table is first defined.  Changing a column produces the same problem 
that it would for a single table, except that the data incompatibility 
problem when changing to a more restrictive type extends to the entire 
composite table and not just the root table itself.  Adding or changing 
constraints also have the problem of being more restrictive than existing 
data in the composite table.  In addition, if the table is in a descendant 
class, changing a child property (especially a constraint) could conflict 
with an ancestor's descendant constraint.

To DROP classes or class properties is even more involved than altering a 
class.  Dropping an entire class from the network might or might not be 
problematic depending on its location.  When a class is a leaf, it can always 
be dropped.  When the class is an intermediate node there are at least four 
options options.  Deleting the class can be forbidden, the entire subtree can 
be pruned, all the subtrees formerly rooted in the deleted node can be freed 
as stand-alone classes, or one can try to merge any orphan subtrees into the 
existing tree.  Two approaches suggest themselves as merge strategies.  The 
deleted node could be replaced with a completely empty and anonymous class.  
Alternatively, one could try to make the orphan subtrees inherit directly 
from the deleted class' parent(s).

When the deleted node is the top node (and not also a leaf node)  ones options 
are similar to those for intermediate nodes, except that the option of 
grafting orphans to their grandparent(s) is unavailable.

Note that in the model suggested here it is suggested that dropping a class' 
table should be distinct from dropping the class _per se_.  DROP TABLE ought 
to result in a completely abstract class node.  Deleting a table in a leaf 
node is no problem.  When other objects depend on the deleted table the 
action the relevant actions are to either forbid the DROP TABLE or merge 
descendant class-tables with the now empty, abstract class.

As a side note, TRUNCATE also would need versions for single tables and 
composite tables.

Dropping columns, constraints, indexes, or event actions can all get tricky 
when it comes to scope.  In particular, one would want a way to say “drop 
this column even when it has been overridden.”  Furthermore, dropping a 
property that was REQUIRED by an ancestor table would cause the attempt to 
delete the property to fail.

 1.3 Relational X Network (Would relational inheritance need support outside 
SQL)

In Postgresql at present ONLY is the only tool for navigating an inheritance 
net.  I have already suggested that much more sophisticated tools for naming 
class nodes will be *needed*.  Many more, along the lines of XPath, would no 
doubt be desirable.

The problem of insufficient network navigation tools pales compared to the 
related problem of whether users would be satisfied with SQL as the query 
tool for the network organization that would be part of a database with 
relational inheritance.  Would users feel the need for a network oriented 
query language like XQuery?

 1.4 Relations other than tables 

There is also the question of whether relations other than tables might 
benefit from features of relational inheritance.  In Postgresql all of the 
following can be regarded as relations: tables (of various flavors), indexes, 
foreign constraints, unique constraints, views, and relational classes.  

One areas of particular interest is the degree to which defined but 
uninstantiated relations should be treated the same as instantiated 
relations.  Application of relational inheritance to views would also be of 
great interest once the problem of relational inheritance and tables was in 
hand.

 2 Conclusion

The most important point made in this essay is that some operations would be 
*very* useful indeed for database design in general even if there were no 
provision for relational inheritance.  First, relational databases should 
separate definition from instantiation.  In particular Postgresql should 
adopt a “DEFINE object definition_name / CREATE object_name FROM 
definition_name” feature in advance of its competitors.  The ability to 
define objects will come in very handy for temporary tables, creating 
variations on a table theme, and so on.  It will also eliminate the need to 
repeatedly run verbose SQL scripts.

The second insight of general interest is that it should be possible to 
recycle constraint and index declarations.  Furthermore, it should be 
possible to create immaculate definitions of constraints and indexes that are 
not attached to any particular table or view.  Immaculate properties would 
eliminate the need to repeatedly script nearly identical constraints and 
indexes across large numbers of relations in a schema.  

For features germane only to classes the most important observation is that 
relational inheritance classes should be first class database objects.  Most 
important, classes should be able to be named.  Also it should be possible to 
directly declare classes.  Developers should be able to create empty classes.  
Even more important, they should be able to create a class with only one 
table [multiple].

In addition, any reasonably complete implementation would need to allow for 
local and class scope, descendant constraints and cascade controls for ALTER 
and DROP.

Even more important is the question of what is to be gained by relational 
inheritance.  I have run across design problems where relational inheritance 
with sub-classing primary keys would have been *very* convenient.  One 
imagines, however, that from the perspective of a database designer trained 
in relational design, the hierarchical network dimension of relational 
inheritance will be “abused”.  (Of course, “abuse” would be a matter of 
perspective.)

Not having much personal experience, my arm-chair expectation for current 
practice is that OO application and relational database schemata are 
developed in separately.  Moving data from the application to the database 
thus requires a translation layer. (The translation layer usually being _ad 
hoc_ and distributed throughout the application rather than being a discrete 
module.)  Report writers, however, can directly query the “well designed” 
relational schema.    

With the object oriented potential of relational inheritance, I would expect 
to see a development pattern where an object oriented program is translated 
into an object oriented schema.  One could then build a translation layer of 
views allowing indirect query of the underlying data, but what one really 
expects is that maintenance database developers and report writers will have 
to directly query an object oriented network database using SQL on an _ad 
hoc_ basis. 

My own strong suspicion is that with relational inheritance many object 
oriented programmers will design their own schemata.  Or, in what will amount 
to the same thing, schemata will be developed based directly on object 
oriented analyses.  Another likely scenario is that when a project gets 
behind, relational schemata get thrown out to speed application development.

The net effect will be to simply move complexity downstream rather than 
dealing with it early on.  Object oriented programmers will have a trivial 
problem putting data into databases slavishly parallel to object oriented 
data structures.  At the other end of the development food chain, report 
writers will have a devil of a time writing SQL to extract data from what 
will have effectively become network databases.

------

[name]: Perhaps one could add a clause to CREATE TABLE like [CLASS [NAME] [IS] 
_class_name_].

[define]: One would choose DECLARE, DEFINE, or some other phrase based on what 
has not been reserved by the current SQL standard.  Hereafter, I use “define” 
and “DEFINE”.

[signature]: For those who prefer safety to power, a signature could be based 
on the fully or sufficiently specified *unordered* columns of a row and an 
ancestor class or table name.

[immaculate]: It seems a shame that constraint names can only be used to 
indicate what constraint one wants to drop.  Where there is a name, there is 
an invitation for reuse.  In all database schemata I have known check 
constraints and foreign key constraints (and indexed columns) are repeatedly 
declared with all but identical lengthy SQL expressions.  When a constraint 
is first declared in-line how much better it would be if the initial form 
could be reused.

An even better solution would be the ability to define the form of a 
constraint (or index) without reference to a concrete table.  I think of this 
as an “immaculate” definition of a normally dependent object.  One could then 
instantiate multiple objects based on the immaculate definition everywhere 
the otherwise tediously repetitive constraint declaration would be needed.  
This conforms to the best practice of storing information once and reusing it 
many times.  It should cut down on errors and reduce maintenance overhead.

[domain]:  As an example of overriding with domains BIGINT could beget BIGINT 
that begets INTEGER that begets the user defined domain of whole_integer 
(where whole_integer is not to be confused with unsigned two octet integer).

[multiple]:  For the sake of simplicity I have assumed that each class 
contained either zero or one tables.  Not discussed in this paper is the 
possibility that a class could hold multiple tables (or mix of tables, views, 
relational inheritance classes, ...).  For example, class foo contains tables 
A, B, and C.  Class bar inherits from foo and thus contains A', B', and C'.