Search Postgresql Archives

Inheritance vs. LIKE - need advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

Sorry to bring up the topic of PostgreSQL inheritance again, but after going through the archives and Google results, I still don't have a clear sense of whether my plan to implement a schema I'm working on is the preferred way to go.

First, I'd like to find out if the way I'm thinking about Inheritance vs. the SQL DDL CREATE TABLE modifier LIKE is correct.

The simplest analogy I can think of from OO techniques is PGSQL Inheritance corresponds to Class Inheritance, while LIKE is more like an inheritable Interface (Java) or Mixin (Ruby). Inheritance maintains strict hierarchical relationships propagating the "Class" identity down through to all progeny. LIKE on the other hand simply provides a means to re-use a set of fields in an unlimited number tables without having to redefine those fields for each table you use them in.

This view is incomplete and far from a perfect fit to the way PGSQL Inheritance & LIKE work, but I think it's a helpful way of thinking of these 2 related mechanisms, when trying to decide how and when to use them in their current form. As has been mentioned many times in posts here, as well as in the PGSQL docs, PGSQL Inheritance is only partial. Table fields are propagated as well as the group identity, but no other RDBMS objects created on the parent ( INDEXES, CONSTRAINTS & SEQUENCES primarily) are inherited. As has been endlessly stated in posts here and elsewhere, this is a significant short-coming for the PGSQL Inheritance mechanism which those of us desirous of using Inheritance would love to see fixed (I understand it has been on the TODO list for many years, as this mechanism has been in the PGSQL code base for over 15 years).

I don't agree this makes PGSQL Inheritance unusable. There are situations where I think it can still be useful, and I describe one below. I'd welcome feedback on that opinion, however, as I'd hate to have my relative ignorance doom the data schema I'm about to fill with a few million rows of data to serious problems later.

The following is an example of using both Inheritance and LIKE in the context described above.

CREATE TABLE curation_info (
   created_by       TEXT    NOT NULL,
   create_date     TIMESTAMP WITH TIME ZONE,
   modified_by     TEXT    NOT NULL,
   mod_date          TIMESTAMP WITH TIME ZONE
);

CREATE TABLE book (
    id_pk                    SERIAL    PRIMARY KEY,
    title                        TEXT        NOT NULL,
    author_id_fk        INT            NOT NULL,
   publisher_id_fk    INT            NOT NULL,
    pub_year              DATE        NOT NULL,
    total_pages         INT            NOT NULL
    LIKE curation_info
);

CREATE TABLE novel (
    id_pk                    SERIAL    PRIMARY KEY,
    genre_id_fk        INT            NOT NULL
) INHERITS (book);

CREATE TABLE textbook (
    id_pk                    SERIAL    PRIMARY KEY,
    subject_id_fk    INT    NOT NULL
) INHERITS (book);


CREATE TABLE publisher (
    id_pk                    SERIAL    PRIMARY KEY,
    name                        TEXT        NOT NULL,
    address_id_fk        INT            NOT NULL,
    LIKE curation_info
);

CREATE TABLE author (
    id_pk                    SERIAL    PRIMARY KEY,
    last_name           TEXT        NOT NULL,
    first_name             TEXT        NOT NULL,
    middle_name         TEXT        NOT NULL,
    address_id_fk        INT            NOT NULL,
    LIKE curation_info
);

This is not the best way to model book info (for instance, books are only allowed to have 1 author in this schema), but it will help me to make my point.

Books, novels and textbooks will be considered equivalent in the context of many queries. At the same time, there will be other queries where it will be important to consider novels & textbooks as distinct entities. The PGSQL Inheritance mechanism easily supports both of these situations.

The curation fields listed in the 'curation_info' table are found ubiquitously in tables throughout many data schema. However, it is not likely there would be a circumstance where you would want to consider all tables containing these fields "curatable entities" to be queried as a group. That simply makes no sense. In this case, LIKE seems to be the best way to propagate these fields, since it doesn't couple all tables containing them to the parent 'curation_info' table.

As I see it, there are at least 3 major problems with adopting such a schema - despite the obvious efficiencies it offers (most of which have been reported elsewhere): 1) none of the parent table ('book') CONSTRAINTS or INDEXES are propagated to the children. This means if you want the children to have the same CONSTRAINTS - as you probably will - you need to build them yourself for each child table. 2) the primary keys generated across the book, novel & textbook tables are completely uncouple and will definitely collide. In other words, due to the fact that neither the SEQUENCE behind the 'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that comes with that field will automatically propagate to the child tables. That is why the SQL DDL given above has an 'id_pk' SERIAL field in all 3 tables. There may be some conditions where you want those PKs to be independent from one another, but those will be much less frequent than the times when you will require they all derive from the same SEQUENCE. 3) The fields inherited from the 'curation_info' table via the LIKE modifier are in no way linked back to the table from which they originated, unlike a an Interface (in Java) or Mixin (in Ruby) would be. If the 'mod_date' field is remove from 'curation_info' it will still remain in all the tables created using 'curation_info' prior to making that change. Same is true if a new field is added to 'curation_info'. If you want that field to be represented in all those tables that had previously been created using the LIKE 'curation_info' modifier, you will have to re-CREATE those tables from scratch.

As I see it, '1' & '3' above are significant drawbacks with no obvious work-around, but they are not deal breakers. I would still have reason to want to use both Inheritance and LIKE because of the efficiencies they provide.

'2' above has a simple and obvious work around which I've been surprised I've not been able to find posted anywhere (leading me to believe I must be missing something). PGSQL automatically builds a SEQUENCE object on SERIAL fields. The name of the SEQUENCE is simply the concatenated name of the table + SERIAL field with '_seq' appended to the end - e.g., for the 'book' table, it would be 'book_id_pk_seq'. In order to guarantee the child tables use that same sequence, you simply declare them as follows instead of using the SERIAL type:

CREATE TABLE textbook (
id_pk INT8 DEFAULT nextval ('book_id_pk_seq') NOT NULL,
    subject_id_fk      INT        NOT NULL
) INHERITS (book);
ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);

Is it a pain to have to write - and maintain - this extra SQL DDL? Yes. Having said that, will it provide the desired behavior? Most definitely yes - again, unless I'm missing something.

So this is how I plan to use INHERIT & LIKE.

My main reason for posting this here, is to get a chance to draw on the breadth of PostgreSQL experience out there to bring the major pitfalls in taking this approach to my attention. I think whatever feedback folks have to offer will be very helpful to me and to others in search of guidance on this issue.

I also thought it would be helpful to present this for archive purposes, since, despite the fact much of what I say here is mentioned elsewhere, I had to search far and wide to find it all and consolidate my thinking on the topic, so it might save others some time to see it all in one place.

It might also be worth adding some of this "advice" - if the consensus is this view is reasonable given the current state of the Inheritance mechanism in PostgreSQL - to one of the official PostgreSQL docs - e.g., FAQ, etc..

Many thanks ahead of time for your feedback and patience in reading this through to the end.

Cheers,
Bill Bug

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)






---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux