Jeremy Finzel <finzelj@xxxxxxxxx> writes: > We are running postgres 11.9 (were running 11.7 prior to recent restart) on > a large db (10s of TB) with 5 or 6 tablespaces and 1000s of tables/indexes. > Within the past few days we have started to see a few queries running for > over 8 hours which we then attempt to terminate, but will not terminate. > These queries are also generating hundreds of thousands of tiny/empty temp > files. In fact, before the restart there were over 23 million files in > pg_tmp which were removed. We also have verified no server settings have > changed for at least a couple weeks, well before this issue started > happening only in the past few days. Hm. For the query plan you show, I think the only plausible explanation for using temp files is that a hash join's hash table is exceeding work_mem so it's spilling batches of tuples to disk. With some bad luck those could be small not large. But I don't see anything in our commit logs between 11.7 and 11.9 that looks like it would have affected any of that behavior. (There were some changes to code governing temp-file tablespace selection, but that could only affect where the files get put not how big they are.) So I doubt that this can be blamed on the update, especially since if I read you correctly it didn't start happening immediately after the update. I'm wondering about changes in table statistics possibly causing a poorly-chosen change in the hashing parameters. What have you got work_mem set to? Can you comment on whether the estimated rowcounts shown in the query plan are accurate? Does manually ANALYZE'ing the tables used in the query change anything? The "unkillable" aspect is odd, but I wonder if that's just a red herring. A query that's generated lots of temp files will try to clean them up at termination, so maybe the backend is just sitting there removing temp files before it'll give control back. regards, tom lane