Search Postgresql Archives

Re: Need magic for identifieing double adresses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Am 16.09.2010 13:18, schrieb Sam Mason:
On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote:
I need to clean up a lot of contact data because of a merge of customer
lists that used to be kept separate.
What to do depends on how much data you have; a few thousand and you can
do lots of fiddling by hand, whereas if you have a few tens of millions
of people you want to try and do more with code.

Thanks Sam,
I'll check this fuzzystrmatch.

We are talking about nearly 500.000 records with considerable overlapping.
It's not only typos to catch. There is variation in the way to write things that not necessarily are wrong.
e.g.
Miller's Bakery
Bakery Miller
Bakery Miller, Ltd.
Bakery Miller and sons
Bakery Smith (formerly Miller)

and the usual
Strawberry Street
Strawberrystreet
Strawberry Str.42
Strawberry Str. 42
Strawberry Str. 42-45


Regards
Andreas



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux