Re: Query planner is using wrong index.

Brian Herlihy <btherl@xxxxxxxxxxxx> · Fri, 7 Apr 2006 00:01:51 +1000 (EST)

--- Ragnar <gnari@xxxxxxx> wrote:

> On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote:
> 
> > Yes, the primary key is far better.  I gave it the ultimate test - I
> dropped
> > the (p2, p3) index.  It's blindingly fast when using the PK, 
> 
> I have problems understanding exactly how an index on 
> (p1,p2,p3) can be faster than and index on (p2,p3) for
> a query not involving p1.
> can you demonstrate this with actual EXPLAIN ANALYZES ?
> something like:
> EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
> BEGIN;
> DROP INDEX p2p3;
> EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?;
> ROLLBACK;
> 
> maybe your p2p3 index needs REINDEX ?
> 

Here's the output.  The timings after caching are repeatable (varying only by
10% or so).  

Query before caching:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';
                                                              QUERY PLAN       

---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=2793.247..2793.247 rows=0 loops=1)
   Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
   Filter: ((p1)::text = 'a'::text)
 Total runtime: 2793.303 ms
(4 rows)

Query after caching:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';
                                                              QUERY PLAN       

---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using p2_p3_idx on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.617..0.617 rows=0 loops=1)
   Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
   Filter: ((p1)::text = 'a'::text)
 Total runtime: 0.665 ms
(4 rows)

=== At this point I did "DROP INDEX p2_p3_idx"

Query after dropping index:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';

QUERY PLAN                                                                     

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=95.188..95.188 rows=0 loops=1)
   Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text =
'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
 Total runtime: 95.239 ms
(3 rows)

Query after dropping index, fully cached:

db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com'
AND p3 = 'web/results?itag=&q=&kgs=&kls=';

QUERY PLAN                                                                     

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.030..0.030 rows=0 loops=1)
   Index Cond: (((p1)::text = 'a'::text) AND ((p2)::text =
'uk.altavista.com'::text) AND ((p3)::text =
'web/results?itag=&q=&kgs=&kls='::text))
 Total runtime: 0.077 ms
(3 rows)

And one where the query planner chooses the primary key instead.  Both p2 and
p3 are present as Most Common Values in pg_statistics:

Query before fully cached:

db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com'
AND p3 = 'search?hl=&lr=&q=';

QUERY PLAN                                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=212.092..212.100 rows=1 loops=1)
   Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text =
'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text))
 Total runtime: 212.159 ms
(3 rows)

Query after fully cached:

db# explain analyze SELECT * FROM t WHERE p1 = 'b' AND p2 = 'www.google.com'
AND p3 = 'search?hl=&lr=&q=';

QUERY PLAN                                                                     

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..6.02 rows=1 width=102) (actual
time=0.034..0.039 rows=1 loops=1)
   Index Cond: (((p1)::text = 'b'::text) AND ((p2)::text =
'www.google.com'::text) AND ((p3)::text = 'search?hl=&lr=&q='::text))
 Total runtime: 0.094 ms
(3 rows)

I have set statistics to 1000 on all of p1, p2 and p3.  The table was recently
vacuumed and analyzed, and the index was recreated (after being dropped) before
these tests were run.  The tests are 100% reproducible, both in postgresql
7.4.7 and 8.1.3.

The indexes are:

t_pkey (p1, p2, p3)  -- UNIQUE, PRIMARY KEY
p2_p3_idx (p2, p3)   -- NOT UNIQUE

The problem is that a lookup which specifies p2 and p3 can return as many as
500 rows.  The optimizer assumes that such a lookup will return 1 row, and so
it chooses a bad plan.  That sums it up.

What I need is a way to make it choose the primary key.

Thanks in advance,
Brian