This may better-belong in pgsql-sql but since it deals with a function as opposed to raw SQL syntax I am sticking it here.... Consider the following DBMS schema slice.... Table "public.post" Column | Type | Modifiers -----------+--------------------------+-------------------------------------------------------- subject | text | message | text | ordinal | integer | not null default nextval('post_ordinal_seq'::regclass) Indexes: "post_pkey" PRIMARY KEY, btree (ordinal) "idx_message" gin (to_tsvector('english'::text, message)) "idx_subject" gin (to_tsvector('english'::text, subject)) (there are a bunch more indices and columns in the table, but these are the ones in question) Now let's run a couple of queries on this: ticker=# explain analyze select * from post where to_tsvector('english', message) @@ to_tsquery('violence') order by modified desc limit 100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2456.32 rows=100 width=433) (actual time=266.703..3046.310 rows=100 loops=1) -> Index Scan Backward using post_modified on post (cost=0.00..240400.00 rows=9787 width=433) (actual time=266.698..3045.920 rows=100 loops=1) Filter: (to_tsvector('english'::text, message) @@ to_tsquery('violence'::text)) Total runtime: 3046.565 ms (4 rows) Ok, not too bad, considering that the table contains close to 2 million rows - ~3 seconds is pretty good. Now let's try something that's NOT in the database: ticker=# explain analyze select * from post where to_tsvector('english', message) @@ to_tsquery('hoseface') order by modified desc limit 100; NOTICE: word is too long to be indexed DETAIL: Words longer than 2047 characters are ignored. QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2456.32 rows=100 width=433) (actual time=303350.036..303350.036 rows=0 loops=1) -> Index Scan Backward using post_modified on post (cost=0.00..240400.00 rows=9787 width=433) (actual time=303350.031..303350.031 rows=0 loops=1) Filter: (to_tsvector('english'::text, message) @@ to_tsquery('hoseface'::text)) Total runtime: 303350.079 ms (4 rows) This is **UNBELIEVABLY** slow; indeed, it appears to have done a sequential scan of the entire table! Why? One would think that if the index lookup fails it fails - and would fail FAST, returning no rows. It appears that this is not the case, and the system actually goes in and tries to look up the query off the message contents, IGNORING the index! That's not good for what should be obvious reasons..... is the "gin" index type screwed up in some form or fashion? This behavior is relatively new. I'm running 8.4.3 and this started happening some time before that - I believe it was an issue in 8.4.2, but I KNOW it was not a problem when I was running 8.3. The confounding factor is that the table has grown rapidly and as such "not happening" before might be more due to the table size than the software release - of that I cannot be certain. The other possibility is that the "NOTICE" results in some sort of flag being set that tells the query processor to ignore the index and perform a sequential scan IF there's a failure to match. If this is the case I will then have to write something to go through and find the offending item and remove it. -- Karl
begin:vcard fn:Karl Denninger n:Denninger;Karl email;internet:karl@xxxxxxxxxxxxx x-mozilla-html:TRUE version:2.1 end:vcard
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general