Re: Some thoughts on table inheritance (which is uniquely awesome on PostgreSQL)

Chris Travers <chris.travers@xxxxxxxxx> · Thu, 23 Aug 2012 18:44:19 -0700

On Thu, Aug 23, 2012 at 12:36 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Wed, Aug 22, 2012 at 10:22 PM, Chris Travers <chris.travers@xxxxxxxxx> wrote:
>> I have now been working with table inheritance for a while and after
>> starting to grapple with many of the use cases it has have become
>> increasingly impressed with this feature.  I also think that some of
>> the apparent limitations fundamentally follow from the support for
>> multiple inheritance, and multiple inheritance itself is so useful I
>> would not want to see this go away.  Inheritance really starts to come
>> to its own once you start using table methods, and some features that
>> are useful in some sorts of inheritance modelling are useless in
>> others.
>
> The problem with postgres table inheritance is that it doesn't really
> solve the problem that people wanted solved: to be able to define an
> set of specific extra attributes for each row depending on some
> characteristic of that row.  The feature only tantalizingly
> masquerades as such.

This is true to some extent.  I may have found a solution there, which
is to build your inheritance solutions on top of the solutions used
for table partitioning.  It's one reason why I say you should start
working with it on table partitioning before you try to do set/subset
modelling elsewhere.  Interestingly one appealing solution doesn't
really work (which is to put a check constraint which checks the
tableoid column, presumably because this isn't set on insert until
after the check constraint fires).

One thing I have found looking through Oracle and DB2 docs is that
their table inheritance seems to have all the same problems as ours
and their solutions to these problems seem rather.... broken from a
pure relational perspective.

For example, Oracle and DB2 make extensive use of OID's here (which
must be recorded in some sort of system catalog somewhere given what
they do with them), and they have functions to take a "reference" to a
row and operators to "dereference" the row.  This sounds all good and
well until you come across the IS DANGLING operator, which returns
true when the reference no longer is there...

In other words, as far as I can see nobody else has come up with a
sane foreign key solution for inherited tables either.
>
> Until it found use in table partitioning, I found the inheritance
> feature to be basically useless.

I think one can actually borrow techniques from table partitioning to
solve the problems associated with inheritance.

However here's what turned me around on table inheritance:

1)  First, in LedgerSMB, we started using it to create consistent
interfaces to sets of storage tables.  The storage tables would behave
differently, but would inherit essentially interfaces from their
parents.  In this regard, you can think of an inheritance tree as a
partitioned table set, but where the partitioning is necessary because
foreign key fields reference different tables in different children.
We use this for example, to avoid having to have a global notes table
or global file attachments table and it gives us clear control over
where these can be attached along with central maintenance of data
structures.  In cases, like with file attachments, where foreign keys
to inheritance trees ended up being needed, we started out with a more
complex but workable solution but I think are going to a simpler one.
This is a good thing.

In essence what we did was use inheritance to give us variable target
tables for a foreign key column.  I would still like to see
inheritable foreign key constraints because that would make some
things a lot easier, but the idea that foreign keys are not, by
default, copied in, means that you can override the destination in the
child table.  It isn't the use documented but it actually works very
well.

2)  Secondly I re-read Stonebraker's "Object-Relational Database:  The
Next Wave" and I had a rather sudden epiphany.  Relational databases
are about modelling your data so you can ensure consistency and gain
as many answers as you can.  Object-relational modelling adds
interfaces (possibly written in arbitrary programming languages) to
derive additional information from stored information.  The example he
gives could be summarized in English to be "Give me all pictures of
sunsets taken within 20 miles of Sacramento" where whether a picture
is of a sunset is determined by analyzing the graphic itself.  Thus
you have to add features to allow you to plug into the query to answer
that question, and you have to have a planner capable of optimizing
such a query.

I also read some other papers which discussed table inheritance and
what sort of modelling problems it was designed to solve (the main one
is actually part/whole modelling where a row may be a whole in itself
and also a part of another whole--- for example we might sell timing
belts, but they might also come included in an engine assembly).

3)  I was talking with Matt Trout regarding object-oriented
programming in Perl, and he turned me on to Moose::Role as essentially
an interface class.  It cannot be instantiated and one would not
simply inherit it in order to instantiate it.  Rather it provides an
ability to assemble classes from re-usable pieces and thus provide
consistent interfaces across a project.    This has become a key to my
understanding of the use cases for multiple inheritance in PostgreSQL,
namely that you can define interfaces across column combinations and
combine those column combinations into a table.  The parent tables are
then largely uninteresting.

Viewed from this perspective, multiple inheritance gives you something
similar to what you get in the Informix examples I have seen regarding
embedding structured data type objects in columns in the table, but it
is superior to that because the individual columns can still be easily
queried using simple, relational queries.  No need to do something
like select customer.display_name() from invoices where .... just in
order to avoid getting something back in tuple notation.  Although we
could still do:  select i.customer_display_name from invoices i where
.... if we want to use derived values.

These things have turned me around from seeing inheritance as a
partially implemented, not particularly useful misfeature into
something I think is actually both extremely useful and can/should be
further developed in the future.  I also think that once solutions to
the harder modelling problems (like set/subset) are well understood
and documented, that more modest efforts can be used to smooth what
sharp corners remain.  However it is *very* hard to see value in a
feature when the examples in the documentation lead people down paths
which cause real pain for db professionals, especially when the docs
are, on the whole, of the quality they are in this project.
Inheritance can still solve the problems it was intended to solve, but
it must be used differently.  Given the existing solutions out there
(which all seem to be based on Informix's half-solution for this
problem), I think we may be in a position to do this right.

Best Wishes,
Chris Travers

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general