Search Postgresql Archives

Re: Fastest Index/Algorithm to find similar sentences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 27, 2013 at 10:34 PM, Janek Sendrowski <janek12@xxxxxx> wrote:
Hi Sergey Konoplev,
 
If I'm searching for a sentence like "The tiger is the largest cat species" for example.
 
I can only find the sentences, which include the words "tiger, largest, cat, species", but I also like to have the sentences with only three or even two of these words.
 
Janek


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Hi,

You may use similarity functions of pg_trgm.

Example:
=# \d+ test
                        Table "public.test"
 Column | Type | Modifiers | Storage  | Stats target | Description 
--------+------+-----------+----------+--------------+-------------
 col    | text |           | extended |              | 
Indexes:
    "test_idx" gin (col gin_trgm_ops)
Has OIDs: no

# SELECT * FROM test;
                   col                   
-----------------------------------------
 The tiger is the largest cat species
 The cheetah is the fastest  cat species
 The peacock is the largest bird species
(3 rows)

=# SELECT show_limit();
 show_limit 
------------
        0.3
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                   col                   |   sml    
-----------------------------------------+----------
 The tiger is the largest cat species    |        1
 The peacock is the largest bird species | 0.511111
 The cheetah is the fastest  cat species | 0.466667
(3 rows)

=# SELECT set_limit(0.5);
 set_limit 
-----------
       0.5
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                   col                   |   sml    
-----------------------------------------+----------
 The tiger is the largest cat species    |        1
 The peacock is the largest bird species | 0.511111
(2 rows)

=# SELECT set_limit(0.9);
 set_limit 
-----------
       0.9
(1 row)

=# SELECT col, similarity(col, 'The tiger is the largest cat species') AS sml
  FROM test WHERE col % 'The tiger is the largest cat species'
  ORDER BY sml DESC, col;
                 col                  | sml 
--------------------------------------+-----
 The tiger is the largest cat species |   1
(1 row)


When you set a higher limit, you get more exact matches.


--
Beena Emerson


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux