Search Postgresql Archives

Re: Merge rows based on Levenshtein distance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, December 2, 2014, mongoose [via PostgreSQL] <[hidden email]> wrote:
David,

Thank you for your prompt reply. I believe your answer helped a lot but it seems I was not clear enough on my description. Basically I want a counter (id) to show if two or more names are similar (i.e. levenshtein distance less than 3) So in the previous example:

From this table:

Name, City
"Booob", "NYC"
"Alex", "Washington"
"Alexj2", "Washington"
"Bob", "NYC"
"Aleex1", "Washington"

to get this table:

id, Name, City
1,"Alex", "Washington"
1,"Aleex1", "Washington"
1,"Alexj2", "Washington"
2,"Bob", "NYC"
2,"Booob", "NYC"

So basically the id is a counter that starts from "1" and increments only when there is a different name. Please notice that the table has its names in a completely random order.


Write and combine a few subqueries that use window functions (namely lag and row_number) to identify groups, label them, and assign rows to each group (using a between condition on a join)

Pondering some (not tested) if you identify the boundary records in a subquery you can assign them a value of 1 while all others take on null.  In the outer query you should be able to assign groups by simply applying the sum function over the entire result such that at each boundary value the presence of the 1 will increment the sum while the null rows will use the sum value from the prior row.

David J.



View this message in context: Re: Merge rows based on Levenshtein distance
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux