On Thu, Feb 02, 2006 at 10:36:54AM +0000, David Goodenough wrote: > > Still, I'm struggling with the basic concept of /identity/, eg. is the > > William Smith born to John Smith and Jane Doe in 1733, the same William > > Smith who marries Mary Jones in the same parish in 1758? You may never > > really know. Still, collecting such disparate "facts" under the same ID > > number, thus taking the identity more or less for granted, is the modus > > operandi of computer genealogy. Thus, one of the major objectives of > > genealogy research, the assertion of identity, becomes totally hidden > > the moment that you decide to cluster disparate evidence about what may > > actually have been totally different persons, under a single ID number. > > > > The alternative is of course to collect each cluster of evidence under a > > separate ID, but then the handling of a "person" becomes a programmer's > > nightmare. > There is also the problem that a name can change. People change names > by deed-poll, and also women can adopt a married name or keep their old > one. All in all an ID is about the only answer. True, the issue being ofcourse that changing a name doesn't change their identity. To the GP, your page is an interesting one and raises several interesting points. In particular the one about the "person" being the conclusion of the rest of the database. You essentially have a set of facts "A married B in C on date D" and you're trying to correlate these. In the end it's just a certain amount of guess work, especially since back then they wern't that particular about spelling as they are today. My naive view is that you're basically assigning trust values to each fact and the chance that two citations refer to the same person. In principle you'd be able to cross-reference all these citations and build the structure quasi-automatically. I suppose in practice this is done by hand. As for your question, I think you're stuck with having a person ID. Basically because you need to identify a person somehow. Given you still have the original citiations, you can split a person into multiple if the situation appears to not work out. One thing I find odd though, your "person" objects have no birthdate or deathdate. Or birth place either. I would have thought these elements would be fundamental in determining if two people are the same, given that they can't change and people are unlikely to forget them. Put another way, two people with the same birthday in the same place with similar names are very likely to be the same. If you can demostrate this is not the case that's another fact. In the end you're dealing with probabilities, you can never know for sure. Anyway, hope this helps. It's a subject I've been vaguely interested in but never really had the time to look into. Have a nice day, -- Martijn van Oosterhout <kleptog@xxxxxxxxx> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment:
signature.asc
Description: Digital signature