Dear pgsql-general moderators,
I sent this post out over the weekend.
Is there a reason why it is not getting posted to the list?
Many thanks for your assistance. I really need some advice on this
issue from people with experience using both inheritance mechanisms
in PostgreSQL.
Cheers,
Bill Bug
Hi All,
Sorry to bring up the topic of PostgreSQL inheritance again, but
after going through the archives and Google results, I still don't
have a clear sense of whether my plan to implement a schema I'm
working on is the preferred way to go.
First, I'd like to find out if the way I'm thinking about
Inheritance vs. the SQL DDL CREATE TABLE modifier LIKE is correct.
The simplest analogy I can think of from OO techniques is PGSQL
Inheritance corresponds to Class Inheritance, while LIKE is more
like an inheritable Interface (Java) or Mixin (Ruby). Inheritance
maintains strict hierarchical relationships propagating the "Class"
identity down through to all progeny. LIKE on the other hand simply
provides a means to re-use a set of fields in an unlimited number
tables without having to redefine those fields for each table you
use them in.
This view is incomplete and far from a perfect fit to the way PGSQL
Inheritance & LIKE work, but I think it's a helpful way of thinking
of these 2 related mechanisms, when trying to decide how and when
to use them in their current form. As has been mentioned many
times in posts here, as well as in the PGSQL docs, PGSQL
Inheritance is only partial. Table fields are propagated as well as
the group identity, but no other RDBMS objects created on the
parent ( INDEXES, CONSTRAINTS & SEQUENCES primarily) are
inherited. As has been endlessly stated in posts here and
elsewhere, this is a significant short-coming for the PGSQL
Inheritance mechanism which those of us desirous of using
Inheritance would love to see fixed (I understand it has been on
the TODO list for many years, as this mechanism has been in the
PGSQL code base for over 15 years).
I don't agree this makes PGSQL Inheritance unusable. There are
situations where I think it can still be useful, and I describe one
below. I'd welcome feedback on that opinion, however, as I'd hate
to have my relative ignorance doom the data schema I'm about to
fill with a few million rows of data to serious problems later.
The following is an example of using both Inheritance and LIKE in
the context described above.
CREATE TABLE curation_info (
created_by TEXT NOT NULL,
create_date TIMESTAMP WITH TIME ZONE,
modified_by TEXT NOT NULL,
mod_date TIMESTAMP WITH TIME ZONE
);
CREATE TABLE book (
id_pk SERIAL PRIMARY KEY,
title TEXT NOT NULL,
author_id_fk INT NOT NULL,
publisher_id_fk INT NOT NULL,
pub_year DATE NOT NULL,
total_pages INT NOT NULL
LIKE curation_info
);
CREATE TABLE novel (
id_pk SERIAL PRIMARY KEY,
genre_id_fk INT NOT NULL
) INHERITS (book);
CREATE TABLE textbook (
id_pk SERIAL PRIMARY KEY,
subject_id_fk INT NOT NULL
) INHERITS (book);
CREATE TABLE publisher (
id_pk SERIAL PRIMARY KEY,
name TEXT NOT NULL,
address_id_fk INT NOT NULL,
LIKE curation_info
);
CREATE TABLE author (
id_pk SERIAL PRIMARY KEY,
last_name TEXT NOT NULL,
first_name TEXT NOT NULL,
middle_name TEXT NOT NULL,
address_id_fk INT NOT NULL,
LIKE curation_info
);
This is not the best way to model book info (for instance, books
are only allowed to have 1 author in this schema), but it will help
me to make my point.
Books, novels and textbooks will be considered equivalent in the
context of many queries. At the same time, there will be other
queries where it will be important to consider novels & textbooks
as distinct entities. The PGSQL Inheritance mechanism easily
supports both of these situations.
The curation fields listed in the 'curation_info' table are found
ubiquitously in tables throughout many data schema. However, it is
not likely there would be a circumstance where you would want to
consider all tables containing these fields "curatable entities" to
be queried as a group. That simply makes no sense. In this case,
LIKE seems to be the best way to propagate these fields, since it
doesn't couple all tables containing them to the parent
'curation_info' table.
As I see it, there are at least 3 major problems with adopting such
a schema - despite the obvious efficiencies it offers (most of
which have been reported elsewhere):
1) none of the parent table ('book') CONSTRAINTS or INDEXES are
propagated to the children. This means if you want the children to
have the same CONSTRAINTS - as you probably will - you need to
build them yourself for each child table.
2) the primary keys generated across the book, novel & textbook
tables are completely uncouple and will definitely collide. In
other words, due to the fact that neither the SEQUENCE behind the
'book.id_pk' SERIAL field, not the PK CONSTRAINTS & INDEX that
comes with that field will automatically propagate to the child
tables. That is why the SQL DDL given above has an 'id_pk' SERIAL
field in all 3 tables. There may be some conditions where you want
those PKs to be independent from one another, but those will be
much less frequent than the times when you will require they all
derive from the same SEQUENCE.
3) The fields inherited from the 'curation_info' table via the
LIKE modifier are in no way linked back to the table from which
they originated, unlike a an Interface (in Java) or Mixin (in Ruby)
would be. If the 'mod_date' field is remove from 'curation_info'
it will still remain in all the tables created using
'curation_info' prior to making that change. Same is true if a new
field is added to 'curation_info'. If you want that field to be
represented in all those tables that had previously been created
using the LIKE 'curation_info' modifier, you will have to re-CREATE
those tables from scratch.
As I see it, '1' & '3' above are significant drawbacks with no
obvious work-around, but they are not deal breakers. I would still
have reason to want to use both Inheritance and LIKE because of the
efficiencies they provide.
'2' above has a simple and obvious work around which I've been
surprised I've not been able to find posted anywhere (leading me to
believe I must be missing something). PGSQL automatically builds a
SEQUENCE object on SERIAL fields. The name of the SEQUENCE is
simply the concatenated name of the table + SERIAL field with
'_seq' appended to the end - e.g., for the 'book' table, it would
be 'book_id_pk_seq'. In order to guarantee the child tables use
that same sequence, you simply declare them as follows instead of
using the SERIAL type:
CREATE TABLE textbook (
id_pk INT8 DEFAULT nextval
('book_id_pk_seq') NOT NULL,
subject_id_fk INT NOT NULL
) INHERITS (book);
ALTER TABLE textbook ADD CONSTRAINT textbook_pk_c PRIMARY KEY(id_pk);
Is it a pain to have to write - and maintain - this extra SQL DDL?
Yes. Having said that, will it provide the desired behavior? Most
definitely yes - again, unless I'm missing something.
So this is how I plan to use INHERIT & LIKE.
My main reason for posting this here, is to get a chance to draw on
the breadth of PostgreSQL experience out there to bring the major
pitfalls in taking this approach to my attention. I think whatever
feedback folks have to offer will be very helpful to me and to
others in search of guidance on this issue.
I also thought it would be helpful to present this for archive
purposes, since, despite the fact much of what I say here is
mentioned elsewhere, I had to search far and wide to find it all
and consolidate my thinking on the topic, so it might save others
some time to see it all in one place.
It might also be worth adding some of this "advice" - if the
consensus is this view is reasonable given the current state of the
Inheritance mechanism in PostgreSQL - to one of the official
PostgreSQL docs - e.g., FAQ, etc..
Many thanks ahead of time for your feedback and patience in reading
this through to the end.
Cheers,
Bill Bug
P.S.: Should this end up double posted, I apologize. There was a
problem with the list accepting my aliased email address.
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq