Re: primary keys

Merlin Moncure <mmoncure@xxxxxxxxx> · Sat, 12 Sep 2009 14:23:48 -0400

On Sat, Sep 12, 2009 at 2:11 PM, Bill Moran <wmoran@xxxxxxxxxxxxxxxxx> wrote:
> On Sun, 13 Sep 2009 02:35:02 +1000
> Grant Maxwell <grant.maxwell@xxxxxxxxxxxx> wrote:
>
>> Hi Folks
>>
>> I'm looking for a bit of advice regarding alpha primary keys.
>>
>> I have a table (designed by someone else) that has a numeric primary
>> key and also a unique non-null email address field.
>>
>> The use of the primary key is causing me some headaches in that in
>> multiple database server environment each server allocates a unique
>> number from a range
>> (and that works fine) but when the table is replicated (master-master-
>> master) the exception handling is a bit tricky because each database
>> server may have
>> records that are duplicate at the email address field - with a
>> different primary key number.
>>
>> I don't know why it was done this way but it seems to me that the
>> email addresses are unique, non null and could be used as the primary
>> key. This would make the
>> replication much faster and simpler.
>>
>> Does anyone out there think the change (number to email address as
>> primary key) would be a bad idea ?
>>
>> your thoughts would be appreciated.
>
> I think you have two issues here, and if you identify them as such, you'll
> come up with a better decision.
>
> The first is, "Should the email address be my primary key".  And the answer
> to that really depends on whether you expect it to change a lot, whether you
> have a lot of FK relations and how easy/difficult it will be to manage those.
>
> The second, which may influence the first, is "how do I manage conflicts
> when data from different sources is merged?" which is a huge, complex
> question.  One potential answer is to replace your integer surrogate key
> with a GUID key, which will be unique from all the different sources.  You

guid solves the surrogate issue wrt data transfer, but glosses over
what happens when you have duplicates.

> could also use a "prefix" system to ensure uniqueness.  Another option is
> to use the email address itself.  I'm surprised by your approach, as
> personally, I've seen a LOT of people who share a single email address
> (husband/wife, for example).  It's not horribly common, since getting

that's going to depend on how it's defined in the app. couple of
different approaches:

email _must_ be unique, and we care not to whom it belongs:
email (email primary key)
contact (email references email [...])

we do care option 1:
email (email, memo text, primary key(email, memo))
contact (email, memo, references email(email, memo))

storing info in memo field to distinguish the different users

we do care option 2:
email (email primary key)
contact (contact_id)
email_contact_map
(
  contact_id references contact,
  email references email,
  memo, -- 'person a', 'person b', etc
  primary key(contact_id, email),
)

The point is this: if you (the user) needs to distinguish between
email adress users, that information should be in the
database...allowing multiple entry of email addresses via serial or
guid allows you to sneak by this requirement

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general