"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@xxxxxxxx> writes: > explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=18200480.80..18200480.81 rows=1 width=8) > -> Hash Join (cost=103206.82..18190602.43 rows=3951347 width=0) > Hash Cond: (x.granuleid = g.granuleid) > -> Seq Scan on table2 x (cost=0.00..10596253.01 rows=644241901 width=8) > -> Hash (cost=92363.72..92363.72 rows=660888 width=8) > -> Index Only Scan using idx_table1 on table1 g (cost=0.57..92363.72 rows=660888 width=8) > Index Cond: (collectionid = '22467'::bigint) > (7 rows) > My question is, what gets put into the Hash? > I assume the with "width=8" must refer to the size of the key. It's the data payload width, but there's some per-row overhead too, which is more than 8 bytes ... don't recall how much more at the moment. > The reason I ask is because, when I try to run the query it fails due to > temp file use over 10GB. What have you got work_mem set to? If the hashed table exceeds that, you're going to end up with a multi-batch join, in which case most of the outer table is going to get written into temp files as well. I don't think it's a good idea to have such a small temp_file_limit when you're working with such huge tables. Either a merge or hash join is likely to require temp file space similar to the table size, and you do *not* want a nestloop join, at least not if you want your answers in reasonable time. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin