Re: Need magic for identifieing double adresses

Gary Chambers <gwchamb@xxxxxxxxx> · Wed, 15 Sep 2010 23:01:25 -0400

Andreas,

> Relevant fields could be  name, street, zip, city, phone
> Is there a way to do something like this with postgresql ?
> I fear this will need still a lot of manual sorting and searching even when
> potential peers get automatically identified.

One of the techniques I use to increase the odds of detecting
duplicates is to trim each column, remove all internal whitespace,
coalesce it into a single string, and calculate an MD5 (some other
hash function may be better) hash.  It's not perfect (we are dealing
with humans, after all), but it helps.

-- Gary Chambers

/* Nothing fancy and nothing Microsoft! */

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general