Re: Fwd: temp_file_limit?

Thomas Munro <thomas.munro@xxxxxxxxx> · Mon, 19 Dec 2022 13:51:33 +1300

On Mon, Dec 19, 2022 at 9:11 AM Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
> On Sun, Dec 18, 2022 at 06:29:41PM +0100, Frits Jalvingh wrote:
> > Just to be clear: my real question is: why is temp_file_limit not
> > working at the specified size? Because this is my real problem: when a
> > query is dying like this it will also kill other queries because these
> > are also running out of space. Even when the limit is per-process it
> > should not have exceeded 200GB imo.

It's really the limit for a single file (or virtual file because we
split them on 1GB boundaries, probably well past time we stopped doing
that), but we create many temporary files for various reasons.  One
possibility is that you've hit a case that needs several rounds of
repartitioning (because of a failure to estimate the number of tuples
well), but we can't see that because you didn't show EXPLAIN (ANALYZE)
output (understandably if it runs out of disk space before
completing...).  The parallel hash code doesn't free up the previous
generations' temporary files; it really only needs two generations'
worth concurrently (the one it's reading from and the one it's writing
to).  In rare cases where more generations are needed it could unlink
the older ones -- that hasn't been implemented yet.  If you set
log_temp_files = 0 to log temporary file names, it should be clear if
it's going through multiple rounds of repartitioning, from the names
(...of32..., ...of64..., ...of128..., ...of256..., ...).