Fwd: temp_file_limit?

Frits Jalvingh <jal@xxxxxx> · Sun, 18 Dec 2022 18:29:41 +0100

Hi Justin, thanks for your help!
Simple things first:
- I am running a single query on a developer machine. Nothing else uses the database at that point.
- The database runs on a disk that has 473GB in use and 1.3T still free. I am watching the increase in size used (watch df -hl /d2).
- If I remove the temp_file_limit the query will run until it has used the 1.3TB that was free, then it dies.
- when it runs I see two PG processes active: a main and a worker process for that main.

I  hope this answers some of the questions: yes, the query is the one using the tempspace; it is the only one running; it uses only one parallel worker.

Just to be clear: my real question is: why is temp_file_limit not working at the specified size? Because this is my real problem: when a query is dying like this it will also kill other queries because these are also running out of space. Even when the limit is per-process it should not have exceeded 200GB imo. BTW, if that limit is really per process instead of per session/query then that is a Very Bad Thing(tm), because this makes the limit effectively worthless - if a query can spawn 8 parallel processes then you can suddenly, without any form of control, again fill up that disk.

I'm not really asking for a solution to the bad performance, but hints are always welcome so I'll include the requested info below:

With the failing plan the query never finishes; it just uses 1.3TB of space, then dies.
This also means I cannot explain analyze as this does not produce output when the query dies. This is a pretty terrible bug in my eyes, because you cannot get the info when it's most needed. If I ever have time left to work on Postgres' code this will be the first thing to fix 8-/

Anyway. The plan that fails badly is this one:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=37360.85..37360.86 rows=1 width=42)
   ->  Sort  (cost=37360.85..37360.85 rows=1 width=42)
         Sort Key: (COALESCE(tijd.tijdkey, 'Unknown'::character varying)), s_h_eenheid_ssm.identificatie
         ->  Hash Join  (cost=34899.49..37360.84 rows=1 width=42)
               Hash Cond: ((ve03678.calender_id)::text = (COALESCE(tijd.tijdkey, 'Unknown'::character varying))::text)
               Join Filter: ((s_h_eenheid_ssm.dv_start_dts <= tijd.einddatum) AND (s_h_eenheid_ssm.dv_end_dts > tijd.einddatum) AND (l_eenheid_sturingslabel_ssm_pe.dv_start_dts <= tijd.einddatum) AND (l_eenheid_sturingslabel_ssm_pe.dv_end_dts > tijd.einddatum) AND (sturingslabel_pe.dv_start_dts <= tijd.einddatum) AND (sturingslabel_pe.dv_end_dts > tijd.einddatum))
               ->  Gather  (cost=34897.66..37358.98 rows=1 width=65)
                     Workers Planned: 1
                     ->  Parallel Hash Join  (cost=33897.66..36358.88 rows=1 width=65)
                           Hash Cond: ((s_h_eenheid_ssm.id_h_eenheid = l_eenheid_sturingslabel_ssm_pe.id_h_eenheid) AND (COALESCE(s_h_eenheid_ssm.id_s, '-1'::integer) = ve03678.eenheid_id))
                           ->  Parallel Seq Scan on s_h_eenheid_ssm  (cost=0.00..2326.55 rows=17955 width=34)
                           ->  Parallel Hash  (cost=33896.02..33896.02 rows=109 width=47)
                                 ->  Parallel Hash Join  (cost=18850.80..33896.02 rows=109 width=47)
                                       Hash Cond: (ve03678.ve03678 = sturingslabel_pe.datum)
                                       ->  Parallel Seq Scan on ve0367801 ve03678  (cost=0.00..12584.92 rows=655792 width=15)
                                       ->  Parallel Hash  (cost=18850.78..18850.78 rows=1 width=40)
                                             ->  Parallel Hash Join  (cost=15458.27..18850.78 rows=1 width=40)
                                                   Hash Cond: (l_eenheid_sturingslabel_ssm_pe.id_h_sturingslabel = sturingslabel_pe.id_h_sturingslabel)
                                                   ->  Parallel Seq Scan on l_eenheid_sturingslabel_ssm l_eenheid_sturingslabel_ssm_pe  (cost=0.00..2963.36 rows=114436 width=24)
                                                   ->  Parallel Hash  (cost=15458.26..15458.26 rows=1 width=24)
                                                         ->  Parallel Seq Scan on s_h_sturingslabel_ssm sturingslabel_pe  (cost=0.00..15458.26 rows=1 width=24)
                                                               Filter: ((soort = 'MSL'::text) AND (code = 'DAE'::text))
               ->  Hash  (cost=1.37..1.37 rows=37 width=11)
                     ->  Seq Scan on tijd  (cost=0.00..1.37 rows=37 width=11)
(24 rows)
https://explain.depesz.com/s/qwsh

By itself I'm used to bad query performance in Postgresql; our application only does bulk queries and Postgres quite often makes terrible plans for those, but with set enable_nestloop=false set always most of them at least execute. The remaining failing queries are almost 100% caused by bad join sequences; I plan to work around those by forcing the join order from our application. For instance, the exact same query above can also generate the following plan (this one was created by manually setting join_collapse_limit = 1, but fast variants also occur quite often when disabling parallelism):
 Unique  (cost=70070.71..70070.72 rows=1 width=42) (actual time=4566.379..4766.112 rows=1058629 loops=1)
   ->  Sort  (cost=70070.71..70070.71 rows=1 width=42) (actual time=4566.377..4618.021 rows=1058629 loops=1)
         Sort Key: (COALESCE(tijd.tijdkey, 'Unknown'::character varying)), s_h_eenheid_ssm.identificatie
         Sort Method: quicksort  Memory: 115317kB
         ->  Gather  (cost=50108.01..70070.70 rows=1 width=42) (actual time=1297.620..1651.003 rows=1058629 loops=1)
               Workers Planned: 1
               Workers Launched: 1
               ->  Parallel Hash Join  (cost=49108.01..69070.60 rows=1 width=42) (actual time=1294.655..1604.524 rows=529314 loops=2)
                     Hash Cond: (((ve03678.calender_id)::text = (COALESCE(tijd.tijdkey, 'Unknown'::character varying))::text) AND (ve03678.eenheid_id = COALESCE(s_h_eenheid_ssm.id_s, '-1'::integer)) AND (ve03678.ve03678 = sturingslabel_pe.datum))
                     ->  Parallel Seq Scan on ve0367801 ve03678  (cost=0.00..12584.92 rows=655792 width=15) (actual time=0.004..27.971 rows=557423 loops=2)
                     ->  Parallel Hash  (cost=49107.99..49107.99 rows=1 width=25) (actual time=1294.347..1294.352 rows=542512 loops=2)
                           Buckets: 2097152 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 100728kB
                           ->  Parallel Hash Join  (cost=39244.15..49107.99 rows=1 width=25) (actual time=390.276..1089.770 rows=542512 loops=2)
                                 Hash Cond: (l_eenheid_sturingslabel_ssm_pe.id_h_sturingslabel = sturingslabel_pe.id_h_sturingslabel)
                                 Join Filter: ((sturingslabel_pe.dv_start_dts <= tijd.einddatum) AND (sturingslabel_pe.dv_end_dts > tijd.einddatum))
                                 ->  Hash Join  (cost=23785.87..33486.40 rows=43548 width=29) (actual time=342.982..791.469 rows=3367092 loops=2)
                                       Hash Cond: (l_eenheid_sturingslabel_ssm_pe.id_h_eenheid = s_h_eenheid_ssm.id_h_eenheid)
                                       Join Filter: ((l_eenheid_sturingslabel_ssm_pe.dv_start_dts <= tijd.einddatum) AND (l_eenheid_sturingslabel_ssm_pe.dv_end_dts > tijd.einddatum))
                                       ->  Parallel Seq Scan on l_eenheid_sturingslabel_ssm l_eenheid_sturingslabel_ssm_pe  (cost=0.00..2963.36 rows=114436 width=24) (actual time=0.002..5.818 rows=97271 loops=2)
                                       ->  Hash  (cost=22217.33..22217.33 rows=125483 width=29) (actual time=342.703..342.705 rows=1129351 loops=2)
                                             Buckets: 2097152 (originally 131072)  Batches: 1 (originally 1)  Memory Usage: 86969kB
                                             ->  Nested Loop  (cost=0.00..22217.33 rows=125483 width=29) (actual time=0.039..175.471 rows=1129351 loops=2)
                                                   Join Filter: ((s_h_eenheid_ssm.dv_start_dts <= tijd.einddatum) AND (s_h_eenheid_ssm.dv_end_dts > tijd.einddatum))
                                                   ->  Seq Scan on s_h_eenheid_ssm  (cost=0.00..2452.23 rows=30523 width=34) (actual time=0.022..4.488 rows=30523 loops=2)
                                                   ->  Materialize  (cost=0.00..1.56 rows=37 width=11) (actual time=0.000..0.001 rows=37 loops=61046)
                                                         ->  Seq Scan on tijd  (cost=0.00..1.37 rows=37 width=11) (actual time=0.009..0.013 rows=37 loops=2)
                                 ->  Parallel Hash  (cost=15458.26..15458.26 rows=1 width=24) (actual time=47.265..47.265 rows=69 loops=2)
                                       Buckets: 1024  Batches: 1  Memory Usage: 72kB
                                       ->  Parallel Seq Scan on s_h_sturingslabel_ssm sturingslabel_pe  (cost=0.00..15458.26 rows=1 width=24) (actual time=4.478..47.241 rows=69 loops=2)
                                             Filter: ((soort = 'MSL'::text) AND (code = 'DAE'::text))
                                             Rows Removed by Filter: 233072
 Planning Time: 0.623 ms
 Execution Time: 5144.937 ms
(33 rows)
https://explain.depesz.com/s/CKhC

Same query, now runs in 5 seconds.

This query is behaving quite special in one of our customers' databases; it runs about 80% of the time in 8 to 16 seconds, about 15% of the time it takes about 2 hours, and the remaining 5% it dies with a disk space issue...

Regards,

Frits