Re: (Questioning the planner's mind) - was Re: Fastest way to drop an index?

Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 5 Feb 2009 08:45:04 +0100

On Feb 4, 2009, at 5:23 AM, Phoenix Kiula wrote:

On Wed, Feb 4, 2009 at 5:13 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Phoenix Kiula <phoenix.kiula@xxxxxxxxx> writes:
Index Scan using new_idx_testimonials_userid on testimonials
(cost=0.00..157.78 rows=1 width=9) (actual time=8809.715..8809.715
rows=0 loops=1)
  Index Cond: ((user_id)::text = 'superman'::text)
  Filter: ((title_encrypted)::text =  
'b333dc1b0992cb8c70b58a418211389a'::text)
Total runtime: 8809.750 ms

This is using the index to fetch the rows that match user_id =  
'superman',
and then testing each fetched row to see if it has the desired  
value of
title_encrypted.  The fact that hardly any rows pass the filter test
tells nearly nothing about how long this should be expected to run.
The rather high estimated cost suggests that the planner thinks there
are several dozen rows matching the index condition, and the actual
runtime suggests that there are actually hundred or thousands of 'em.
If so, your problem is that you need a different index.  I'd bet on  
an
index on title_encrypted being more useful for this query than the  
one
on user_id; or you could experiment with a two-column index.

Thanks Tom. My thinking exactly.

So I have made a two column index on (user_id, title_encrypted)
already. It's done.

But the planner keeps insisting on using the "user_id" as you see from
that EXPLAIN ANALYZE. This was done when the other two-col index
already exists!

Could I force the 2-col index? I googled for "force index postgresql"
and came upon this discussion -
http://archives.postgresql.org/pgsql-sql/2006-02/msg00190.php  - which
suggests that the planner may be selecting indexes based on "cost'.

I am not too technically savvy, but I think this means that given the
choice of these two scenarios...

   1. Search through "user_id" index, and then limit it by  
"title_encrypted"
or
   2. Search through "user_id, title_encrypted" 2-col index

...the planner decides that it is less resource intensive to go
through the somewhat smaller user_id index and then limit it (i.e.,
scenario 1) than to wade through the bigger second index.

The EXPLAIN ANALYZE of the second form of the query would tell. I  
suppose you'd rather not lose your new_idx_testimonials_userid index,  
but you can drop that in a transaction and roll back. I don't think  
that'd impact other transactions at all, seeing how MVCC just marks it  
deleted for your session, but I can't promise you that. Caution is  
advised ;)

Try:
BEGIN;
EXPLAIN ANALYZE SELECT * FROM testimonials WHERE userid='superman' AND  
title_encrypted='b333dc1b0992cb8c70b58a418211389a';
DROP INDEX new_idx_testimonial_userid;
ANALYZE testimonials;
EXPLAIN ANALYZE SELECT * FROM testimonials WHERE userid='superman' AND  
title_encrypted='b333dc1b0992cb8c70b58a418211389a';
ROLLBACK;

Am I on the right track? If I am, well what's the way around this? How
can I make the planner make use of the 2-col index? Or if my
understanding is not right, why is the scenario 1 being chosen to
begin with?

Thanks for any thoughts! This single query, which used to be much
faster than this, is now slowing down our operations by about 8 second
per query!

Let me know if you need to know any pgsql.conf settings. Only index
related setting I know of are these:

 enable_indexscan             = on
 enable_bitmapscan            = off
 enable_nestloop              = on

I think for this case the planner might be trying a bitmap scan and  
you upped the costs of that significantly by turning it 'off'. The  
earlier query-plan will show. There shouldn't be any need to disable  
bitmap scans in the first place, you might want to try that same query  
with bitmap scans enabled and see how it performs.

Another possible cause is that both indexes are quite large and they  
can't both fit into memory. I suspect the new_idx_tesimonials_userid  
index gets used in other queries as well, so it's likely in the cache.  
I guess that could make using that index and scan through the results  
faster than reading the new index from disk.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,498a9909747034241410875!

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general