Hi Experts,
As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not.
I have tried to change the ordering of table by tuning parameter even if using pg_hint_plan but not success.
Why does planner do not choose the plan which hash table is created on testtbl2 (which can take less time)?
And how to change the order?
# I also confirm planner info by rebuild postgresql but not found related usefull info about hash table
---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select generate_series(1,1000000),random()::text,random()::text,random()::text;
INSERT 0 1000000
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857
postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=38775.00..47171.72 rows=1 width=59) (actual time=1120.824..1506.236 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.008..27.964 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1120.687..1120.687 rows=1000000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 89713kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
Planning time: 0.922 ms
Execution time: 1521.258 ms
(8 rows)
postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=48541.00..67352.86 rows=1 width=59) (actual time=1220.625..1799.709 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.011..58.649 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1219.295..1219.295 rows=1000000 loops=1)
Buckets: 8192 Batches: 32 Memory Usage: 2851kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
Planning time: 3.971 ms
Execution time: 1807.710 ms
(8 rows)
postgres=#
---
Thanks and best regard!