----- Original Message -----
From: "Leif B. Kristensen" <leif@xxxxxxxxxxxxxx>
To: <pgsql-general@xxxxxxxxxxxxxx>
Sent: Thursday, February 02, 2006 4:07 AM
Subject: Re: [GENERAL] Primary keys for companies and people
[snip]
I'm very interested to hear what other use in their applications for
holding people and companies.
I've been thinking long and hard about the same thing myself, in
developing my genealogy database. For identification of people, there
seems to be no realistic alternative to an arbitrary ID number.
Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William
[snip]
I have long been interested in this issue, and it is one that transcends the
problem of IDs in IT. For my second doctorate, I examined this in the
context of historical investigation, applying numerical classification
techniques to biographical information that can be extracted from historical
documents. It is, I fear, a problem for which only a probabilistic answer
can be obtained in most historical cases. For example, there was an
eleventh century viking king Harold who as a teenager was part of his
cousin's court, and then found it necessary to flee to Kiev when his cousin
found hiimself on the losing side of a rebellion. He then made his way into
the Byzantine empire and served the emperor as a mercenary through much of
the mediterranean, finally returning in fame and glory to Norway where he
found another relative (a nephew IIRC) on the throne, which he inherited
about a year after his return. Impagine yourself as a historian trying to
write his biography. You'd find various documents all over the western
world (as known in the viking age) written in a variety of languages, and
using different names to refer to him. It isn't an easy task to determine
which documents refer specifically to him. And to make things even more
interesting, many documents refer to a given person only by his official
title, and in other cases, the eldest son of each generation was given the
same name as his father.
In my own case, in the time I was at the University of Toronto, I know of
four other men who had precisely the same name I have. I know this from
strange phone calls from faculty I never studied with about assignments and
examinations for courses I had never taken. In each case, the professor
checked again with the university's records department and found the correct
student. The last case was particularly disturbing since in that case,
things were a bit different in that I had taken a graduate course with the
professor in question, and he stopped me on campus and asked about an
assignment for a given advanced undergraduate course that I had not taken,
but my namesake had. What made this disturbing is that not only did the
other student carry my name, but he also looked enough like me that our
professor could mistake me for him on campus! I can only hope that he is a
well behaved, law abiding citizen! The total time period in question was 18
years. In general, the problem only gets more challenging as the length,
and as the age, of the historical period considered increases.
The point is, not only are the combinations of family and given names not
reliably unique, even certain biological data, such as photographs of the
human face, not adequately unique. Even DNA fingerprints, putatively the
best available biometric technology, are not entirely reliable since even
that can not distinguish between identical twins, and at the same time,
there can be, admittedly extremely rare as far as we know, developmental
anomalies resulting in a person being his own twin (this results from twin
fetuses merging, with the consequence that the resulting person has some
organ systems from one of the original fetuses and some from the other).
For historical questions, I don't believe one can get any better than
inference based on a balance of probabilities. A geneologist has no option
but to become an applied statistician! For purposes of modern investigation
or for the purpose of modern business, one may do better through an
appropriate use of a combination of technologies. This is a hard problem,
even with the use of the best available technologies and especially given
the current problems associated with identity theft.
For software developers in general, and database developers in particular,
there are several distinct questions to consider.:
1) How does one reliably determine identity to begin with, and then use that
identity with whatever technology one might use to represent it?
2) How good does this technology, and identification process need to be? In
other words, how does the cost of a mistake (esp. in identification) relate
to the increased cost of using better technology? In this analysis, one
needs to consider both the cost of such a mistake to the person identified
or misidentified, and the cost to the owner or user of the application or
database. Who will suffer if a mistake is made? Will, or can, bad things
happen if a given person ends up with more than one ID? What is the cost,
and who bears this cost, if more than one person can use the same ID?
3) Can we construct a suite of best practices from which we can select given
specific functional or non-functional constraints as developed for our
application? Included with this question is consideration of protection of
sensitive data in general, and protection of data that might conceivably be
used by cyber-criminals in activity related to identity theft, or to use
sensitive data to the harm of the person so identified.
4) How is biometric data best stored and searched for use in authentication
processes within an arbitrary application? I guess this question assumes
that biometric data needs to be used in an authentication request, and it
occurs to me that for some applications, it may be sufficient to use
biometric data in creation of a unique user id, and subsequently may be
needed only for certain sensitive processes or resources.
My own feeling is that some options are very easy, and some of these are
adequate for some situations, but that there are others that may be needed
depending on the sentivity of the data in question or on the potential cost
to one or more parties to a given business process. I expect to be
considering these issues extensively over the next few years since they are
relevant to some of the web applications I am designing. Any insights you,
or others, may have on these questions would be greatly appreciated.
Cheers,
Ted
R.E. (Ted) Byers, Ph.D., Ed.D.
R & D Decision Support Solutions
http://www.randddecisionsupportsolutions.com/