How to change order sort of table in HashJoin

Man Trieu <man.trieu@xxxxxxxxx> · Sat, 19 Nov 2016 14:46:47 +0900

Hi Experts,

As in the example below, i think the plan which hash table is created on testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not. 

I have tried to change the ordering of table by tuning parameter even if using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on testtbl2 (which can take less time)?
And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related usefull info about hash table  

---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text, primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select generate_series(1,1000000),random()::text,random()::text,random()::text;
INSERT 0 1000000
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38775.00..47171.72 rows=1 width=59) (actual time=1120.824..1506.236 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.008..27.964 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1120.687..1120.687 rows=1000000 loops=1)
         Buckets: 131072  Batches: 1  Memory Usage: 89713kB
         ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
 Planning time: 0.922 ms
 Execution time: 1521.258 ms
(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2 using(c1,c2,c3);
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=48541.00..67352.86 rows=1 width=59) (actual time=1220.625..1799.709 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2) AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56) (actual time=0.011..58.649 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=1000000 width=55) (actual time=1219.295..1219.295 rows=1000000 loops=1)
         Buckets: 8192  Batches: 32  Memory Usage: 2851kB
         ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=1000000 width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
 Planning time: 3.971 ms
 Execution time: 1807.710 ms
(8 rows)

postgres=#
---

Thanks and best regard!