Re: Performance regression between 8.3 and 8.4 on heavy text indexing

gael@xxxxxxxxxxxxxxxx (Gaël Le Mignot) · Thu, 27 Aug 2009 22:03:10 +0200

Hello Guillaume!

Wed, 26 Aug 2009 23:59:25 +0200, you wrote: 

 > On Wed, Aug 26, 2009 at 6:29 PM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote:
 >> gael@xxxxxxxxxxxxxxxx (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
 >>> So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
 >> 
 >> Yup.  What's even more interesting is that it seems the real win would
 >> have been to use just the 'claude & duviau' condition (which apparently
 >> matched only 14 rows).  8.3 had no hope whatever of understanding that,
 >> it just got lucky.  8.4 should have figured it out, I'm thinking.
 >> Does it help if you increase the statistics target for fulltext_tsv?
 >> (Don't forget to re-ANALYZE after doing so.)

 > It could be interesting to run the query without the condition
 > (keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
 > (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) in 8.4.

Here it is ::

libebench=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau'))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12264.98..12265.11 rows=50 width=24) (actual time=3.799..3.825 rows=10 loops=1)
   ->  Sort  (cost=12264.98..12271.03 rows=2421 width=24) (actual time=3.794..3.802 rows=10 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=2363.10..12184.56 rows=2421 width=24) (actual time=3.579..3.693 rows=10 loops=1)
               Recheck Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  Bitmap Index Scan on fulltext_index  (cost=0.00..2362.49 rows=2877 width=0) (actual time=3.499..3.499 rows=14 loops=1)
                     Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 Total runtime: 166.772 ms
(10 rows)

So it estimates 2877 rows for that, while in reality it's 14.

 > Btw, what Tom means by increasing the statistics is executing the
 > following queries:
 > ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;

Ok, I  did it for 500 also  on the keywords_tsv column,  which was the
other  contestor. Here  we have  a  clear improvement:  the search  in
keyword_tsv is now  estimated at 10398 (real being  10222) and the one
on fulltext_tsv at 1 (real being 14).

I did it at 1000 too, it's almost the same result.

By re-running  our sampling of 7334  queries on the  database with the
statistics at 1000  on both fulltext_tsv and keywords_tsv,  we do have
overall better results than with 8.3 ! So a greeat thanks to everyone.

The  weird thing  was  that with  the  default of  100 for  statistics
target, it was  worse than when we  moved back to 10. So  I didn't try
with 1000, but I should have.

I'll do  more tests and  keep the  list informed if  it can be  of any
help.

-- 
Gaël Le Mignot - gael@xxxxxxxxxxxxxxxx
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance