Re: Primary keys for companies and people

"Ted Byers" <r.ted.byers@xxxxxxxxxx> · Thu, 2 Feb 2006 16:57:40 -0500

----- Original Message ----- 
From: "Leif B. Kristensen" <leif@xxxxxxxxxxxxxx>
To: <pgsql-general@xxxxxxxxxxxxxx>
Sent: Thursday, February 02, 2006 4:07 AM
Subject: Re: [GENERAL] Primary keys for companies and people

[snip]
I'm very interested to hear what other use in their applications for
holding people and companies.

I've been thinking long and hard about the same thing myself, in
developing my genealogy database. For identification of people, there
seems to be no realistic alternative to an arbitrary ID number.

Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William
[snip]

I have long been interested in this issue, and it is one that transcends the 
problem of IDs in IT.  For my second doctorate, I examined this in the 
context of historical investigation, applying numerical classification 
techniques to biographical information that can be extracted from historical 
documents.  It is, I fear, a problem for which only a probabilistic answer 
can be obtained in most historical cases.  For example, there was an 
eleventh century viking king Harold who as a teenager was part of his 
cousin's court, and then found it necessary to flee to Kiev when his cousin 
found hiimself on the losing side of a rebellion.  He then made his way into 
the Byzantine empire and served the emperor as a mercenary through much of 
the mediterranean, finally returning in fame and glory to Norway where he 
found another relative (a nephew IIRC) on the throne, which he inherited 
about a year after his return.  Impagine yourself as a historian trying to 
write his biography.  You'd find various documents all over the western 
world (as known in the viking age) written in a variety of languages, and 
using different names to refer to him.  It isn't an easy task to determine 
which documents refer specifically to him.  And to make things even more 
interesting, many documents refer to a given person only by his official 
title, and in other cases, the eldest son of each generation was given the 
same name as his father.

In my own case, in the time I was at the University of Toronto, I know of 
four other men who had precisely the same name I have.  I know this from 
strange phone calls from faculty I never studied with about assignments and 
examinations for courses I had never taken.  In each case, the professor 
checked again with the university's records department and found the correct 
student.  The last case was particularly disturbing since in that case, 
things were a bit different in that I had taken a graduate course with the 
professor in question, and he stopped me on campus and asked about an 
assignment for a given advanced undergraduate course that I had not taken, 
but my namesake had.  What made this disturbing is that not only did the 
other student carry my name, but he also looked enough like me that our 
professor could mistake me for him on campus!  I can only hope that he is a 
well behaved, law abiding citizen!  The total time period in question was 18 
years.  In general, the problem only gets more challenging as the length, 
and as the age, of the historical period considered increases.

The point is, not only are the combinations of family and given names not 
reliably unique, even certain biological data, such as photographs of the 
human face, not adequately unique.  Even DNA fingerprints, putatively the 
best available biometric technology, are not entirely reliable since even 
that can not distinguish between identical twins, and at the same time, 
there can be, admittedly extremely rare as far as we know, developmental 
anomalies resulting in a person being his own twin (this results from twin 
fetuses merging, with the consequence that the resulting person has some 
organ systems from one of the original fetuses and some from the other). 
For historical questions, I don't believe one can get any better than 
inference based on a balance of probabilities.  A geneologist has no option 
but to become an applied statistician!  For purposes of modern investigation 
or for the purpose of modern business, one may do better through an 
appropriate use of a combination of technologies.  This is a hard problem, 
even with the use of the best available technologies and especially given 
the current problems associated with identity theft.

For software developers in general, and database developers in particular, 
there are several distinct questions to consider.:

1) How does one reliably determine identity to begin with, and then use that 
identity with whatever technology one might use to represent it?

2) How good does this technology, and identification process need to be?  In 
other words, how does the cost of a mistake (esp. in identification) relate 
to the increased cost of using better technology?  In this analysis, one 
needs to consider both the cost of such a mistake to the person identified 
or misidentified, and the cost to the owner or user of the application or 
database.  Who will suffer if a mistake is made?  Will, or can, bad things 
happen if a given person ends up with more than one ID?  What is the cost, 
and who bears this cost, if more than one person can use the same ID?

3) Can we construct a suite of best practices from which we can select given 
specific functional or non-functional constraints as developed for our 
application?  Included with this question is consideration of protection of 
sensitive data in general, and protection of data that might conceivably be 
used by cyber-criminals in activity related to identity theft, or to use 
sensitive data to the harm of the person so identified.

4) How is biometric data best stored and searched for use in authentication 
processes within an arbitrary application?  I guess this question assumes 
that biometric data needs to be used in an authentication request, and it 
occurs to me that for some applications, it may be sufficient to use 
biometric data in creation of a unique user id, and subsequently may be 
needed only for certain sensitive processes or resources.

My own feeling is that some options are very easy, and some of these are 
adequate for some situations, but that there are others that may be needed 
depending on the sentivity of the data in question or on the potential cost 
to one or more parties to a given business process.  I expect to be 
considering these issues extensively over the next few years since they are 
relevant to some of the web applications I am designing.  Any insights you, 
or others, may have on these questions would be greatly appreciated.

Cheers,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.
R & D Decision Support Solutions
http://www.randddecisionsupportsolutions.com/