Re: Performance regression between 8.3 and 8.4 on heavy text indexing

gael@xxxxxxxxxxxxxxxx (Gaël Le Mignot) · Wed, 26 Aug 2009 18:03:34 +0200

Hello Guillaume!

Sun, 23 Aug 2009 14:49:05 +0200, you wrote: 

 > Hi Gaël,
 > On Fri, Aug 21, 2009 at 3:37 PM, Gaël Le Mignot<gael@xxxxxxxxxxxxxxxx> wrote:
 >> With 8.3 ::
 >> 
 >>  Limit  (cost=752.67..752.67 rows=1 width=24)
 >>  (11 rows)
 >> 
 >> With 8.4 ::
 >>  (8 rows)

 > Could you provide us the EXPLAIN *ANALYZE* output of both plans?

Sure, here it is :

With 8.3 ::

libearticles=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=760.74..760.74 rows=1 width=24) (actual time=449.057..449.080 rows=9 loops=1)
   ->  Sort  (cost=760.74..760.74 rows=1 width=24) (actual time=449.053..449.061 rows=9 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=756.71..760.73 rows=1 width=24) (actual time=420.704..448.571 rows=9 loops=1)
               Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery))
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  BitmapAnd  (cost=756.71..756.71 rows=1 width=0) (actual time=420.612..420.612 rows=0 loops=1)
                     ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.96 rows=573 width=0) (actual time=129.338..129.338 rows=10225 loops=1)
                           Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
                     ->  Bitmap Index Scan on fulltext_index  (cost=0.00..707.50 rows=573 width=0) (actual time=289.775..289.775 rows=14 loops=1)
                           Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 Total runtime: 471.905 ms
(13 rows)

With 8.4 ::

libebench=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=758.51..758.51 rows=1 width=24) (actual time=50816.635..50816.660 rows=9 loops=1)
   ->  Sort  (cost=758.51..758.51 rows=1 width=24) (actual time=50816.628..50816.637 rows=9 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=14.03..758.50 rows=1 width=24) (actual time=8810.133..50816.484 rows=9 loops=1)
               Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) AND ((classname)::text = 'article'::text))
               ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0) (actual time=158.563..158.563 rows=10222 loops=1)
                     Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
 Total runtime: 50817.040 ms
(10 rows)

So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).

 >> From what I can see, one of the difference is that the estimates of
 > the number of rows are / 3 for this part of the query:
 > 8.3 ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.97 rows=574 width=0)
 > 8.4 ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0)

 > It might be interesting to see if 8.4 is right or not.

 > Before 8.4, the selectivity for full text search was a constant (as
 > you can see it in your 8.3 plan: the number of rows are equal in both
 > bitmap index scan). 8.4 is smarter which might lead to other plans.

I see, thanks  for your answer. What's weird  is that this "smartness"
leads to overall worse results in  our case, is there some tweaking we
can  do?   I  didn't  see  anything in  the  documentation  to  change
weighting inside the text-match heuristic.

-- 
Gaël Le Mignot - gael@xxxxxxxxxxxxxxxx
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance