Please do this small optimization if it is possible. It seem that the
optimizer have the all information to create a fast plan but it does not
do that.
create temp table t1 (id bigint, t bigint);
insert into t1 values (1, 1);
insert into t1 values (2, 2);
insert into t1 values (2, 3);
insert into t1 values (2, 4);
insert into t1 values (3, 5);
create temp table t2 (id bigint, t bigint);
insert into t2 (id, t)
select g, 2
from generate_series(1, 200) g;
insert into t2 (id, t)
select g, 3
from generate_series(201, 300) g;
insert into t2 (id, t)
select g, 4
from generate_series(301, 400) g;
insert into t2 (id, t)
select g, 1
from generate_series(401, 100000) g;
insert into t2 (id, t)
select g, 5
from generate_series(100001, 100100) g;
create index t_idx on t2(t);
analyze t1;
analyze t2;
explain analyze
select *
from t2
join t1 on t1.t = t2.t
where t1.t = 2
explain analyze
select *
from t2
join t1 on t1.t = t2.t
where t1.id = 3
explain analyze
select *
from t2
where t2.t in (2, 3, 4)
These two queries are completely equal and optimizator should know it as
I see from the plans:
"Hash Join (cost=1.09..2667.09 rows=75000 width=32) (actual
time=0.026..100.207 rows=400 loops=1)"
" Hash Cond: (t2.t = t1.t)"
" -> Seq Scan on t2 (cost=0.00..1541.00 rows=100000 width=16) (actual
time=0.007..47.083 rows=100000 loops=1)"
" -> Hash (cost=1.05..1.05 rows=3 width=16) (actual time=0.011..0.011
rows=3 loops=1)"
" -> Seq Scan on t1 (cost=0.00..1.05 rows=3 width=16) (actual
time=0.005..0.008 rows=3 loops=1)" <-- HERE IS THE PROBLEM. IF THE
ESTIMATED COUNT = 1 OPTIMIZER BUILDS THE CORRECT FAST PLAN, BUT IF THE
ESTIMATION IS GREATER THAN 1 WE HAVE A PROBLEM
" Filter: (id = 2)"
"Total runtime: 100.417 ms"
"Nested Loop (cost=0.00..1024.46 rows=20020 width=32) (actual
time=0.030..0.222 rows=100 loops=1)"
" -> Seq Scan on t1 (cost=0.00..1.05 rows=1 width=16) (actual
time=0.008..0.009 rows=1 loops=1)"
" Filter: (id = 3)"
" -> Index Scan using t_idx on t2 (cost=0.00..773.16 rows=20020
width=16) (actual time=0.016..0.078 rows=100 loops=1)"
" Index Cond: (t2.t = t1.t)"
"Total runtime: 0.296 ms"
"Bitmap Heap Scan on t2 (cost=16.09..556.80 rows=429 width=16) (actual
time=0.067..0.256 rows=400 loops=1)"
" Recheck Cond: (t = ANY ('{2,3,4}'::bigint[]))"
" -> Bitmap Index Scan on t_idx (cost=0.00..15.98 rows=429 width=0)
(actual time=0.056..0.056 rows=400 loops=1)"
" Index Cond: (t = ANY ('{2,3,4}'::bigint[]))"
"Total runtime: 0.458 ms"
An ugly workaround is to add the column t1(t) in the table t2.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance