Re: Column order in multi column primary key

Kevin Grittner <kgrittn@xxxxxxxxx> · Mon, 8 Aug 2016 16:44:18 -0500

On Mon, Aug 8, 2016 at 4:01 PM, Craig Boucher <craig@xxxxxxxxxx> wrote:
> From: Tom Lane [mailto:tgl@xxxxxxxxxxxxx]

>> I'm pretty skeptical of the notion of redefining what your PK
>> is on performance grounds.  With this definition, you'd allow
>> two entries with the same work_session_id, if they chanced to
>> have different customer_ids.  Is that really OK?

> It could actually be beneficial if we need to migrate a customer
> from one database to another because wouldn't have to worry about
> pk constraint violations.

Isn't "wouldn't have to worry about pk constraint violations"
another way of saying "we're OK with allowing the same logical row
to be present multiple times in the table with inconsistent data
among the various copies"?  You may want to read up on data
normalization and shoot for 3rd normal form and its benefits.

The short version you can use as a memory device when mentally
testing whether a key is write is that every non-key column should
be functionally dependent on the key, the whole key, and nothing
but the key ("so help you Codd").  That means that:
 (1) given the key values you can determine the values of all other
columns (1st normal form),
 (2) if you omit any of the key columns from the key you cannot
determine the values of all other columns (2nd normal form), and
 (3) no column's value may be determined from non-key columns, and
therefore only determined from the key indirectly (3rd normal form).

You are talking about degrading your normalization to 1st normal
form.  There is a wealth of literature on the problems that can
introduce.  What would be in line with well-established theory and
practice is looking for a "natural key" in each table -- some
combination of columns which naturally occur in the table which
uniquely identify the rows.  In some cases it is necessary to add
some "tie-breaker" column to the end of the key when you do this --
like a sequence within the group or a timestamp.

If you search the archives you will find periodic discussions of
the relative merits of this approach versus adding a meaningless
synthetic key (often called "id" in every table) to use by itself
as the primary key.  This is essentially the same as adding
"pointers" among the various records and constraining how
"navigation" among tables can happen.  It has a big performance
downside in generating statistics and large reports because it
requires a lot of pointer chasing.  Proponents will point out how
convenient it is to be able to change human-visible identifying
values, potentially on a very large scale, by modifying one column
of one row.  That is, of course, a double-edged sword -- in
discussing design alternatives with the CPAs who were going to be
auditing financial data stored in a database, they didn't tend to
see that as nearly as much of a plus as some programmers do.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general