Kevin,
I am also interested in a deeper explanation of what you mean by, "The easy way to do that is just to add a trigram index and search for similar strings, and forget about full text search." Because I need to make a decision about whether to use full text search or use other pattern matching facilities such as LIKE and/or regular expressions. For me, the reason I don't just default to full text search is the documents are relative small (i.e, HTML <= 128K) and number fewer than 10,000 so I'm not sure if the effort expended to learn the ins/outs of full text search will be beneficial to my use case.
Regards,
Dane
On Thu, Jul 23, 2015 at 12:17 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Thu, Jul 23, 2015 at 2:55 AM, amihay gonen <agonenil@xxxxxxxxx> wrote:Hi I'm trying to implement a text search in PG .My goal to enable the user search on several columns also on partial words.here is sample code :create table test_test( text_data tsvector, text_a varchar,text_b varchar);insert into test_test(text_a,text_b) select 'name 10.10.2.3 ','name3 name' from generate_series(1,500);update test_test set text_data=to_tsvector(text_a||' '||text_b);CREATE INDEX test_test_idx ON test_test USING gin(text_data);explain ANALYZE select * from test_test where text_data@@plainto_tsquery('name');my questions are :1. why the index is not used (I guess it is related to the way the data is generated)It returns the entire table, so there is no point in using an index. Yes, it is the way it is generated, the same data repeated over and over is not very realistic. If you just want random text, I use md5(random()::text). But if you want text that looks vaguely like English, I don't have a nice trick for that. Maybe load the sgml files into a table.2, how can I use pg_trgm with ts_vector to enable to answer query like 10.10 or nam ?the idea is to use the gin index , maybe there are other option without using pg_trgm?Do you mean:WHERE text_a LIKE '%10.10%' or text_a LIKE '%nam%' ?With the or, that going to be hard to optimize.Anyway, pg_tgrm requires its own special GIN index, it can't piggy back on the tsvector GIN index.CREATE INDEX whatever ON test_test USING gin(text_a gin_trgm_ops);orCREATE INDEX whatever ON test_test USING gin((text_a||' '||text_b) gin_trgm_ops);But, LIKE '%10.10%' is going to be tough for a pg_trgm index to help with, unless you compile your own code after removing "#define KEEPONLYALNUM"Cheers,Jeff