Hello Michael, On Tue, 2016-11-29 at 19:10 -0500, Michael Sheaver wrote: > Greetings, > > I have two tables that are populated using large datasets from > disparate external systems, and I am trying to match records by > customer name between these two tables. I do not have any > authoritative key, such as customerID or nationalID, by which I can > match them up, and I have found many cases where the same customer > has different first names in the two datasets. A sampling of the > differences is as follows: > > Michael <=> Mike > Tom <=> Thomas > Liz <=> Elizabeth > Margaret <=> Maggie > > How can I build a query in PostgreSQL (v. 9.6) that will find > possible matches like these on nicknames? My initial guess is that I > would have to either find or build some sort of intermediary table > that contains associated names like those above. Sometimes though, > there will be more than matching pairs, like: > > Jim <=> James <=> Jimmy <=> Jimmie > Bill <=> Will <=> Willie <=> William > > and so forth. > > Has anyone used or developed PostgreSQL queries that will find > matches like these? I am running all my database queries. on my local > laptops (Win7 and macOS), so performance or uptime is no issue here. > I am curious to see how others in this community have creatively > solved this common problem. > > One of the PostgreSQL dictionaries (synonym, thesaurus etc.) might > work here, but honestly I am clueless as to how to set this up or use > it in queries successfully. > > Thanks, > Michael (aka Mike, aka Mikey) > Check out chapter F15 in the doco. Try the double metaphone. I worked on something similar many years ago cleaning up input created by data entry clerks from hand written speeding tickets, so as to match with "trusted" data held in a database. As the volume of input was small in comparison with the number of licensed drivers, we could iterate over and over again trying to match it up. HTH. Rob -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general