Le 27 févr. 2017 à 10:32, Oleg Bartunov écrivait : > > > On Sun, Feb 26, 2017 at 3:52 PM, Nicolas Paris <niparisco@xxxxxxxxx> wrote: > > Hello, > > AFAIK there is no built-in way to combine full text search and fuzzy > matching > (https://www.postgresql.org/docs/current/static/fuzzystrmatch.html). > By example, phrase searching with tipos in it. > > First I don't know if postgresql concurrents (lucene based...) are able > to do so. > > > Usually, https://www.postgresql.org/docs/current/static/pgtrgm.html is used for > this. > Hi Oleg, Thanks. I thought pgtrgm was not able to index my long texts because of limitation of 8191 bytes per index row for btree. Then I found out it is possible to use pgtrgm over a GIN/GIST index. My final use case is phrase mining in texts. I want my application returns texts that contains approximatly the user entry: Eg: user search "Hello Word" a text containing "blah blah blah hello world blah blah blah" would be returned. Test: postgres=# CREATE table test_trgm (texts text); CREATE TABLE postgres=# CREATE INDEX ON test_trgm USING GIN(texts gin_trgm_ops); CREATE INDEX postgres=# SET enable_seqscan = OFF; SET postgres=# insert into test_trgm VALUES ('blah blah blah hello world blah blah blah'); INSERT 0 1 postgres=# insert into test_trgm VALUES ('blah blah blah hello word blah blah blah'); INSERT 0 1 postgres=# SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; texts | similarity -------------------------------------------+------------ blah blah blah hello world blah blah blah | 0.473684 blah blah blah hello word blah blah blah | 0.6875 (2 rows) postgres=# EXPLAIN SELECT texts, similarity(texts, 'hello word') FROM test_trgm WHERE texts % 'hello word'; QUERY PLAN ----------------------------------------------------------------------------------- Bitmap Heap Scan on test_trgm (cost=52.01..56.03 rows=1 width=32) Recheck Cond: (texts % 'hello word'::text) -> Bitmap Index Scan on test_trgm_texts_idx (cost=0.00..52.01 rows=1 width=0) Index Cond: (texts % 'hello word'::text) (4 rows) Conclusion: If I d'say 0.4 is my threshold, would this methodology meet my requirements ? Thanks for the help ! > > Second, is such feature is in the road map ? > > Third, I wonder if it is a good idea to use the postgresql synonyms > feature for such prupose.(https://www.postgresql.org/docs/current/static/ > textsearch-dictionaries.html) > I mean, building up a synonyms dictionnary containing tipos. By eg: > > postgres pgsql > postgresql pgsql > postgrez pgsql > postgre pgsql > gogle googl > gooogle googl > > There is multiple way to build such dictionary. But my question is about > the implementation of dictionnaries in postgresql: Is postgresql > supposed to take advantage of billion entries dictionaries ? > > > dictionary is just a program, so it's up to developer how to write efficient > program to deal with billion entries. Specifically to synonym dictionary, it's > not intended to work with a lot of entries. btw, have a look on contrib/ > dict_xsyn dictionary, which is more flexible than synonym. > > > Thanks by advance for you answers, > > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general