Optimization idea

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please do this small optimization if it is possible. It seem that the optimizer have the all information to create a fast plan but it does not do that.

create temp table t1 (id bigint, t bigint);

insert into t1 values (1, 1);
insert into t1 values (2, 2);
insert into t1 values (2, 3);
insert into t1 values (2, 4);
insert into t1 values (3, 5);

create temp table t2 (id bigint, t bigint);

insert into t2 (id, t)
select g, 2
from generate_series(1, 200) g;

insert into t2 (id, t)
select g, 3
from generate_series(201, 300) g;

insert into t2 (id, t)
select g, 4
from generate_series(301, 400) g;

insert into t2 (id, t)
select g, 1
from generate_series(401, 100000) g;

insert into t2 (id, t)
select g, 5
from generate_series(100001, 100100) g;

create index t_idx on t2(t);

analyze t1;
analyze t2;

explain analyze
select *
from t2
 join t1 on t1.t = t2.t
where t1.t = 2

explain analyze
select *
from t2
 join t1 on t1.t = t2.t
where t1.id = 3

explain analyze
select *
from t2
where t2.t in (2, 3, 4)


These two queries are completely equal and optimizator should know it as I see from the plans:

"Hash Join (cost=1.09..2667.09 rows=75000 width=32) (actual time=0.026..100.207 rows=400 loops=1)"
"  Hash Cond: (t2.t = t1.t)"
" -> Seq Scan on t2 (cost=0.00..1541.00 rows=100000 width=16) (actual time=0.007..47.083 rows=100000 loops=1)" " -> Hash (cost=1.05..1.05 rows=3 width=16) (actual time=0.011..0.011 rows=3 loops=1)" " -> Seq Scan on t1 (cost=0.00..1.05 rows=3 width=16) (actual time=0.005..0.008 rows=3 loops=1)" <-- HERE IS THE PROBLEM. IF THE ESTIMATED COUNT = 1 OPTIMIZER BUILDS THE CORRECT FAST PLAN, BUT IF THE ESTIMATION IS GREATER THAN 1 WE HAVE A PROBLEM
"              Filter: (id = 2)"
"Total runtime: 100.417 ms"

"Nested Loop (cost=0.00..1024.46 rows=20020 width=32) (actual time=0.030..0.222 rows=100 loops=1)" " -> Seq Scan on t1 (cost=0.00..1.05 rows=1 width=16) (actual time=0.008..0.009 rows=1 loops=1)"
"        Filter: (id = 3)"
" -> Index Scan using t_idx on t2 (cost=0.00..773.16 rows=20020 width=16) (actual time=0.016..0.078 rows=100 loops=1)"
"        Index Cond: (t2.t = t1.t)"
"Total runtime: 0.296 ms"

"Bitmap Heap Scan on t2 (cost=16.09..556.80 rows=429 width=16) (actual time=0.067..0.256 rows=400 loops=1)"
"  Recheck Cond: (t = ANY ('{2,3,4}'::bigint[]))"
" -> Bitmap Index Scan on t_idx (cost=0.00..15.98 rows=429 width=0) (actual time=0.056..0.056 rows=400 loops=1)"
"        Index Cond: (t = ANY ('{2,3,4}'::bigint[]))"
"Total runtime: 0.458 ms"

An ugly workaround is to add the column t1(t) in the table t2.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux