I have a couple of tables (people and addresses) which are using serials as primary keys and contain many potentially duplicate data in them. The problem is that the data has not been input in a careful way so for example you have a first_name, middle_name and last_name fields but you could have Samuel L Jackson, Samuel Jackson, Sam Jackson and even Jackson L Samuel (data in the wrong fields) in the database representing the same person. I have been thinking of some algorithms that might work to identify the duplicate records but I am no mathematician so I thought I would ask here before I wasted a lot of time trying to solve a problem that has already been solved. Postgres has lots of great functionality in the fuzzystringmatch so I am sure it can excel at this kind of thing. Any ideas or links to documents would be much appreciated. Cheers. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general