Search Postgresql Archives

Re: Primary keys for companies and people

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message ----- From: "Leif B. Kristensen" <leif@xxxxxxxxxxxxxx>
To: <pgsql-general@xxxxxxxxxxxxxx>
Sent: Thursday, February 02, 2006 4:07 AM
Subject: Re: [GENERAL] Primary keys for companies and people


[snip]
I'm very interested to hear what other use in their applications for
holding people and companies.

I've been thinking long and hard about the same thing myself, in
developing my genealogy database. For identification of people, there
seems to be no realistic alternative to an arbitrary ID number.

Still, I'm struggling with the basic concept of /identity/, eg. is the
William Smith born to John Smith and Jane Doe in 1733, the same William
[snip]

I have long been interested in this issue, and it is one that transcends the problem of IDs in IT. For my second doctorate, I examined this in the context of historical investigation, applying numerical classification techniques to biographical information that can be extracted from historical documents. It is, I fear, a problem for which only a probabilistic answer can be obtained in most historical cases. For example, there was an eleventh century viking king Harold who as a teenager was part of his cousin's court, and then found it necessary to flee to Kiev when his cousin found hiimself on the losing side of a rebellion. He then made his way into the Byzantine empire and served the emperor as a mercenary through much of the mediterranean, finally returning in fame and glory to Norway where he found another relative (a nephew IIRC) on the throne, which he inherited about a year after his return. Impagine yourself as a historian trying to write his biography. You'd find various documents all over the western world (as known in the viking age) written in a variety of languages, and using different names to refer to him. It isn't an easy task to determine which documents refer specifically to him. And to make things even more interesting, many documents refer to a given person only by his official title, and in other cases, the eldest son of each generation was given the same name as his father.

In my own case, in the time I was at the University of Toronto, I know of four other men who had precisely the same name I have. I know this from strange phone calls from faculty I never studied with about assignments and examinations for courses I had never taken. In each case, the professor checked again with the university's records department and found the correct student. The last case was particularly disturbing since in that case, things were a bit different in that I had taken a graduate course with the professor in question, and he stopped me on campus and asked about an assignment for a given advanced undergraduate course that I had not taken, but my namesake had. What made this disturbing is that not only did the other student carry my name, but he also looked enough like me that our professor could mistake me for him on campus! I can only hope that he is a well behaved, law abiding citizen! The total time period in question was 18 years. In general, the problem only gets more challenging as the length, and as the age, of the historical period considered increases.

The point is, not only are the combinations of family and given names not reliably unique, even certain biological data, such as photographs of the human face, not adequately unique. Even DNA fingerprints, putatively the best available biometric technology, are not entirely reliable since even that can not distinguish between identical twins, and at the same time, there can be, admittedly extremely rare as far as we know, developmental anomalies resulting in a person being his own twin (this results from twin fetuses merging, with the consequence that the resulting person has some organ systems from one of the original fetuses and some from the other). For historical questions, I don't believe one can get any better than inference based on a balance of probabilities. A geneologist has no option but to become an applied statistician! For purposes of modern investigation or for the purpose of modern business, one may do better through an appropriate use of a combination of technologies. This is a hard problem, even with the use of the best available technologies and especially given the current problems associated with identity theft.

For software developers in general, and database developers in particular, there are several distinct questions to consider.:

1) How does one reliably determine identity to begin with, and then use that identity with whatever technology one might use to represent it?

2) How good does this technology, and identification process need to be? In other words, how does the cost of a mistake (esp. in identification) relate to the increased cost of using better technology? In this analysis, one needs to consider both the cost of such a mistake to the person identified or misidentified, and the cost to the owner or user of the application or database. Who will suffer if a mistake is made? Will, or can, bad things happen if a given person ends up with more than one ID? What is the cost, and who bears this cost, if more than one person can use the same ID?

3) Can we construct a suite of best practices from which we can select given specific functional or non-functional constraints as developed for our application? Included with this question is consideration of protection of sensitive data in general, and protection of data that might conceivably be used by cyber-criminals in activity related to identity theft, or to use sensitive data to the harm of the person so identified.

4) How is biometric data best stored and searched for use in authentication processes within an arbitrary application? I guess this question assumes that biometric data needs to be used in an authentication request, and it occurs to me that for some applications, it may be sufficient to use biometric data in creation of a unique user id, and subsequently may be needed only for certain sensitive processes or resources.

My own feeling is that some options are very easy, and some of these are adequate for some situations, but that there are others that may be needed depending on the sentivity of the data in question or on the potential cost to one or more parties to a given business process. I expect to be considering these issues extensively over the next few years since they are relevant to some of the web applications I am designing. Any insights you, or others, may have on these questions would be greatly appreciated.

Cheers,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.
R & D Decision Support Solutions
http://www.randddecisionsupportsolutions.com/



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux