Re: question on hash joins

Tom Lane <tgl@xxxxxxxxxxxxx> · Wed, 18 Oct 2017 17:17:06 -0400

"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@xxxxxxxx> writes:
> explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
>                                                           QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)
>    ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)
>          Hash Cond: (x.granuleid = g.granuleid)
>          ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)
>          ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)
>                ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)
>                      Index Cond: (collectionid = '22467'::bigint)
> (7 rows)

> My question is, what gets put into the Hash?
> I assume the with "width=8" must refer to the size of the key.

It's the data payload width, but there's some per-row overhead too, which
is more than 8 bytes ... don't recall how much more at the moment.

> The reason I ask is because, when I try to run the query it fails due to
> temp file use over 10GB.

What have you got work_mem set to?  If the hashed table exceeds that,
you're going to end up with a multi-batch join, in which case most of
the outer table is going to get written into temp files as well.

I don't think it's a good idea to have such a small temp_file_limit
when you're working with such huge tables.  Either a merge or hash
join is likely to require temp file space similar to the table size,
and you do *not* want a nestloop join, at least not if you want your
answers in reasonable time.

			regards, tom lane

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin