question on hash joins

"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@xxxxxxxx> · Wed, 18 Oct 2017 20:00:14 +0000

Hi All,

I have a question about hash joins and the meaning of the width in the explain output for a query.

I have two large tables table1 has 1 million rows and table2 has 600 million rows

When I try to join these two tables based using one constraint on table1
(which reduces the candidate rows down to 660,888) the optimizer seems to correctly
choose to hash the values from table1.

explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
                                                          QUERY PLAN                                                          

-------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)
   ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)
         Hash Cond: (x.granuleid = g.granuleid)
         ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)
         ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)
               ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)
                     Index Cond: (collectionid = '22467'::bigint)
(7 rows)

My question is, what gets put into the Hash?
I assume the with "width=8" must refer to the size of the key.
Does the entire row get copied to the hash as the corresponding value?

The reason I ask is because, when I try to run the query it fails due to
temp file use over 10GB.  How do I accurately determine the amount of memory
that will be used.

select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
ERROR:  temporary file size exceeds temp_file_limit (10485760 kB)

Thanks in advance,
Bob