Hi All,
Sorry to bring up the topic of PostgreSQL inheritance again, but
after going through the archives and Google results, I still don't
have a clear sense of whether my plan to implement a schema I'm
working on is the preferred way to go.
First, I'd like to find out if the way I'm thinking about Inheritance
vs. the SQL DDL CREATE TABLE modifier LIKE is correct.
The simplest analogy I can think of from OO techniques is PGSQL
Inheritance corresponds to Class Inheritance, while LIKE is more like
an inheritable Interface (Java) or Mixin (Ruby). Inheritance
maintains strict hierarchical relationships propagating the "Class"
identity down through to all progeny. LIKE on the other hand simply
provides a means to re-use a set of fields in an unlimited number
tables without having to redefine those fields for each table you use
them in.
This view is incomplete and far from a perfect fit to the way PGSQL
Inheritance & LIKE work, but I think it's a helpful way of thinking
of these 2 related mechanisms, when trying to decide how and when to
use them in their current form. As has been mentioned many times in
posts here, as well as in the PGSQL docs, PGSQL Inheritance is only
partial. Table fields are propagated as well as the group identity,
but no other RDBMS objects created on the parent ( INDEXES,
CONSTRAINTS & SEQUENCES primarily) are inherited. As has been
endlessly stated in posts here and elsewhere, this is a significant
short-coming for the PGSQL Inheritance mechanism which those of us
desirous of using Inheritance would love to see fixed (I understand
it has been on the TODO list for many years, as this mechanism has
been in the PGSQL code base for over 15 years).
I don't agree this makes PGSQL Inheritance unusable. There are
situations where I think it can still be useful, and I describe one
below. I'd welcome feedback on that opinion, however, as I'd hate to
have my relative ignorance doom the data schema I'm about to fill
with a few million rows of data to serious problems later.
The following is an example of using both Inheritance and LIKE in the
context described above.
CREATE TABLE curation_info (
created_by TEXT NOT NULL,
create_date TIMESTAMP WITH TIME ZONE,
modified_by TEXT NOT NULL,
mod_date TIMESTAMP WITH TIME ZONE
);
CREATE TABLE book (
id_pk SERIAL PRIMARY KEY,
title TEXT NOT NULL,
author_id_fk INT NOT NULL,
publisher_id_fk INT NOT NULL,
pub_year DATE NOT NULL,
total_pages INT NOT NULL
LIKE curation_info
);
CREATE TABLE novel (
id_pk SERIAL PRIMARY KEY,
genre_id_fk INT NOT NULL
) INHERITS (book);
CREATE TABLE textbook (
id_pk SERIAL PRIMARY KEY,
subject_id_fk INT NOT NULL
) INHERITS (book);
CREATE TABLE publisher (
id_pk SERIAL PRIMARY KEY,
name TEXT NOT NULL,
address_id_fk INT NOT NULL,
LIKE curation_info
);
CREATE TABLE author (
id_pk SERIAL PRIMARY KEY,
last_name TEXT NOT NULL,
first_name TEXT NOT NULL,
middle_name TEXT NOT NULL,
address_id_fk INT NOT NULL,
LIKE curation_info
);
This is not the best way to model book info (for instance, books are
only allowed to have 1 author in this schema), but it will help me to
make my point.
Books, novels and textbooks will be considered equivalent in the
context of many queries. At the same time, there will be other
queries where it will be important to consider novels & textbooks as
distinct entities. The PGSQL Inheritance mechanism easily supports
both of these situations.
The curation fields listed in the 'curation_info' table are found
ubiquitously in tables throughout many data schema. However, it is
not likely there would be a circumstance where you would want to
consider all tables containing these fields "curatable entities" to
be queried as a group. That simply makes no sense. In this case,
LIKE seems to be the best way to propagate these fields, since it
doesn't couple all tables containing them to the parent
'curation_info' table.
As I see it, there are at least 3 major problems with adopting such a
schema - despite the obvious efficiencies it offers (most of which
have been reported elsewhere):
1) none of the parent table ('book') CONSTRAINTS or INDEXES are
propagated to the children. This means if you want the children to
have the same CONSTRAINTS - as you probably will - you need to build
them yourself for each child table.
2) the primary keys generated across the book, novel & textbook
tables are completely uncouple and will definitely collide. In other
words, due to the fact that neither the SEQUENCE behind the
'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that comes
with that field will automatically propagate to the child tables.
That is why the SQL DDL given above has an 'id_pk' SERIAL field in
all 3 tables. There may be some conditions where you want those PKs
to be independent from one another, but those will be much less
frequent than the times when you will require they all derive from
the same SEQUENCE.
3) The fields inherited from the 'curation_info' table via the
LIKE modifier are in no way linked back to the table from which they
originated, unlike a an Interface (in Java) or Mixin (in Ruby) would
be. If the 'mod_date' field is remove from 'curation_info' it will
still remain in all the tables created using 'curation_info' prior to
making that change. Same is true if a new field is added to
'curation_info'. If you want that field to be represented in all
those tables that had previously been created using the LIKE
'curation_info' modifier, you will have to re-CREATE those tables
from scratch.
As I see it, '1' & '3' above are significant drawbacks with no
obvious work-around, but they are not deal breakers. I would still
have reason to want to use both Inheritance and LIKE because of the
efficiencies they provide.
'2' above has a simple and obvious work around which I've been
surprised I've not been able to find posted anywhere (leading me to
believe I must be missing something). PGSQL automatically builds a
SEQUENCE object on SERIAL fields. The name of the SEQUENCE is simply
the concatenated name of the table + SERIAL field with '_seq'
appended to the end - e.g., for the 'book' table, it would be
'book_id_pk_seq'. In order to guarantee the child tables use that
same sequence, you simply declare them as follows instead of using
the SERIAL type:
CREATE TABLE textbook (
id_pk INT8 DEFAULT nextval
('book_id_pk_seq') NOT NULL,
subject_id_fk INT NOT NULL
) INHERITS (book);
ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);
Is it a pain to have to write - and maintain - this extra SQL DDL?
Yes. Having said that, will it provide the desired behavior? Most
definitely yes - again, unless I'm missing something.
So this is how I plan to use INHERIT & LIKE.
My main reason for posting this here, is to get a chance to draw on
the breadth of PostgreSQL experience out there to bring the major
pitfalls in taking this approach to my attention. I think whatever
feedback folks have to offer will be very helpful to me and to others
in search of guidance on this issue.
I also thought it would be helpful to present this for archive
purposes, since, despite the fact much of what I say here is
mentioned elsewhere, I had to search far and wide to find it all and
consolidate my thinking on the topic, so it might save others some
time to see it all in one place.
It might also be worth adding some of this "advice" - if the
consensus is this view is reasonable given the current state of the
Inheritance mechanism in PostgreSQL - to one of the official
PostgreSQL docs - e.g., FAQ, etc..
Many thanks ahead of time for your feedback and patience in reading
this through to the end.
Cheers,
Bill Bug
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org