Search Postgresql Archives

Re: Merge rows based on Levenshtein distance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 3, 2014 at 9:14 AM, pinker [via PostgreSQL] <[hidden email]> wrote:
There is nice extension in postgres: fuzzystrmatch I have used to calculate the distance. From documetation:

SELECT levenshtein_less_equal('extensive', 'exhaustive',2);

You can use it then with your group by query.


​Something like this - replace the substring(...) comparison with legenshtein_less_equal(...)​ or whatever comparison you find applicable.

In the case below new groups are started whenever the first letter of the value changes.

The first group would be NULL so I add a COALESCE() call to make it 0 - subsequent groups start with 1 and increment properly.

WITH src (val) AS (
VALUES ('A1'::varchar),('A2'),('B1'),('B2'),('B3'),('C1'),('D1')
)
, grp AS (
SELECT val
, CASE WHEN 
            substring(val,1,1) <> substring(lag(val) OVER (ORDER BY val),1,1) 
       THEN 1 
       ELSE NULL 
       END AS changed 
, ROW_NUMBER() OVER (ORDER BY val) AS val_idx
FROM src
)
SELECT val, COALESCE(sum(changed) OVER (ORDER BY val_idx), 0) AS group_id
FROM grp
​;

David J.​
 


View this message in context: Re: Merge rows based on Levenshtein distance
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux