Inheritance vs. LIKE - need advice

William Bug <wb27@xxxxxxxxxx> · Tue, 02 Aug 2005 01:40:06 -0400

Dear pgsql-general moderators,

I sent this post out over the weekend.

Is there a reason why it is not getting posted to the list?

Many thanks for your assistance.  I really need some advice on this  
issue from people with experience using both inheritance mechanisms  
in PostgreSQL.

Cheers,
Bill Bug

Hi All,

Sorry to bring up the topic of PostgreSQL inheritance again, but  
after going through the archives and Google results, I still don't  
have a clear sense of whether my plan to implement a schema I'm  
working on is the preferred way to go.

First, I'd like to find out if the way I'm thinking about  
Inheritance vs. the SQL DDL CREATE TABLE modifier LIKE is correct.

The simplest analogy I can think of from OO techniques is PGSQL  
Inheritance corresponds to Class Inheritance, while LIKE is more  
like an inheritable Interface (Java) or Mixin (Ruby).  Inheritance  
maintains strict hierarchical relationships propagating the "Class"  
identity down through to all progeny. LIKE on the other hand simply  
provides a means to re-use a set of fields in an unlimited number  
tables without having to redefine those fields for each table you  
use them in.

This view is incomplete and far from a perfect fit to the way PGSQL  
Inheritance & LIKE work, but I think it's a helpful way of thinking  
of these 2 related mechanisms, when trying to decide how and when  
to use them in their current form.  As has been mentioned many  
times in posts here, as well as in the PGSQL docs, PGSQL  
Inheritance is only partial. Table fields are propagated as well as  
the group identity, but no other RDBMS objects created on the  
parent ( INDEXES, CONSTRAINTS & SEQUENCES primarily) are  
inherited.  As has been endlessly stated in posts here and  
elsewhere, this is a significant short-coming for the PGSQL  
Inheritance mechanism which those of us desirous of using  
Inheritance would love to see fixed (I understand it has been on  
the TODO list for many years, as this mechanism has been in the  
PGSQL code base for over 15 years).

I don't agree this makes PGSQL Inheritance unusable.  There are  
situations where I think it can still be useful, and I describe one  
below.  I'd welcome feedback on that opinion, however, as I'd hate  
to have my relative ignorance doom the data schema I'm about to  
fill with a few million rows of data to serious problems later.

The following is an example of using both Inheritance and LIKE in  
the context described above.

CREATE TABLE curation_info (
   created_by       TEXT    NOT NULL,
   create_date     TIMESTAMP WITH TIME ZONE,
   modified_by     TEXT    NOT NULL,
   mod_date          TIMESTAMP WITH TIME ZONE
);

CREATE TABLE book (
    id_pk                    SERIAL    PRIMARY KEY,
    title                        TEXT        NOT NULL,
    author_id_fk        INT            NOT NULL,
   publisher_id_fk    INT            NOT NULL,
    pub_year              DATE        NOT NULL,
    total_pages         INT            NOT NULL
    LIKE curation_info
);

CREATE TABLE novel (
    id_pk                    SERIAL    PRIMARY KEY,
    genre_id_fk        INT            NOT NULL
) INHERITS (book);

CREATE TABLE textbook (
    id_pk                    SERIAL    PRIMARY KEY,
    subject_id_fk    INT    NOT NULL
) INHERITS (book);

CREATE TABLE publisher (
    id_pk                    SERIAL    PRIMARY KEY,
    name                        TEXT        NOT NULL,
    address_id_fk        INT            NOT NULL,
    LIKE curation_info
);

CREATE TABLE author (
    id_pk                    SERIAL    PRIMARY KEY,
    last_name           TEXT        NOT NULL,
    first_name             TEXT        NOT NULL,
    middle_name         TEXT        NOT NULL,
    address_id_fk        INT            NOT NULL,
    LIKE curation_info
);

This is not the best way to model book info (for instance, books  
are only allowed to have 1 author in this schema), but it will help  
me to make my point.

Books, novels and textbooks will be considered equivalent in the  
context of many queries.  At the same time, there will be other  
queries where it will be important to consider novels & textbooks  
as distinct entities.  The PGSQL Inheritance mechanism easily  
supports both of these situations.

The curation fields listed in the 'curation_info' table are found  
ubiquitously in tables throughout many data schema.  However, it is  
not likely there would be a circumstance where you would want to  
consider all tables containing these fields "curatable entities" to  
be queried as a group.  That simply makes no sense.  In this case,  
LIKE seems to be the best way to propagate these fields, since it  
doesn't couple all tables containing them to the parent  
'curation_info' table.

As I see it, there are at least 3 major problems with adopting such  
a schema - despite the obvious efficiencies it offers (most of  
which have been reported elsewhere):
    1) none of the parent table ('book') CONSTRAINTS or INDEXES are  
propagated to the children.  This means if you want the children to  
have the same CONSTRAINTS - as you probably will - you need to  
build them yourself for each child table.
    2) the primary keys generated across the book, novel & textbook  
tables are completely uncouple and will definitely collide.  In  
other words, due to the fact that neither the SEQUENCE behind the  
'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that  
comes with that field will automatically propagate to the child  
tables.  That is why the SQL DDL given above has an 'id_pk' SERIAL  
field in all 3 tables.  There may be some conditions where you want  
those PKs to be independent from one another, but those will be  
much less frequent than the times when you will require they all  
derive from the same SEQUENCE.
    3) The fields inherited from the 'curation_info' table via the  
LIKE modifier are in no way linked back to the table from which  
they originated, unlike a an Interface (in Java) or Mixin (in Ruby)  
would be.  If the 'mod_date' field is remove from 'curation_info'  
it will still remain in all the tables created using  
'curation_info' prior to making that change.  Same is true if a new  
field is added to 'curation_info'.  If you want that field to be  
represented in all those tables that had previously been created  
using the LIKE 'curation_info' modifier, you will have to re-CREATE  
those tables from scratch.

As I see it, '1' & '3' above are significant drawbacks with no  
obvious work-around, but they are not deal breakers.  I would still  
have reason to want to use both Inheritance and LIKE because of the  
efficiencies they provide.

'2' above has a simple and obvious work around which I've been  
surprised I've not been able to find posted anywhere (leading me to  
believe I must be missing something).  PGSQL automatically builds a  
SEQUENCE object on SERIAL fields.  The name of the SEQUENCE is  
simply the concatenated name of the table + SERIAL field with  
'_seq' appended to the end - e.g., for the 'book' table, it would  
be 'book_id_pk_seq'.  In order to guarantee the child tables use  
that same sequence, you simply declare them as follows instead of  
using the SERIAL type:

CREATE TABLE textbook (
    id_pk                    INT8     DEFAULT nextval 
('book_id_pk_seq')         NOT NULL,
    subject_id_fk      INT        NOT NULL
) INHERITS (book);
ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);

Is it a pain to have to write - and maintain - this extra SQL DDL?   
Yes.  Having said that, will it provide the desired behavior?  Most  
definitely yes - again, unless I'm missing something.

So this is how I plan to use INHERIT & LIKE.

My main reason for posting this here, is to get a chance to draw on  
the breadth of PostgreSQL experience out there to bring the major  
pitfalls in taking this approach to my attention.  I think whatever  
feedback folks have to offer will be very helpful to me and to  
others in search of guidance on this issue.

I also thought it would be helpful to present this for archive  
purposes, since, despite the fact much of what I say here is  
mentioned elsewhere, I had to search far and wide to find it all  
and consolidate my thinking on the topic, so it might save others  
some time to see it all in one place.

It might also be worth adding some of this "advice" - if the  
consensus is this view is reasonable given the current state of the  
Inheritance mechanism in PostgreSQL - to one of the official  
PostgreSQL docs - e.g., FAQ, etc..

Many thanks ahead of time for your feedback and patience in reading  
this through to the end.

Cheers,
Bill Bug

P.S.: Should this end up double posted, I apologize.  There was a  
problem with the list accepting my aliased email address.

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq