Hi,
During investigation of the OOM server crash (with more than 30GB available memory) I found it was triggered by a single query.
related settings:
work_mem=32MB
hash_mem_multiplier=1
max_parallel_workers_per_gather=3
Now the result of the investigation performed on an idle server with no other activity on it.
Now to results:
Query and explain (analyze, costs, buffers, timing) results:
explain (analyze, costs, buffers, timing) select * from employer_manager WHERE employer_id IN (
SELECT employer_id
FROM
employer e
WHERE
EXISTS (
SELECT 1
FROM employer_service es
JOIN account_service_package_entry aspe
ON (aspe.service_package_id = es.employer_service_id)
JOIN account_service_event ase
ON (ase.account_service_package_entry_id = aspe.id)
JOIN vacancy v
ON (ase.object_id = v.vacancy_id)
WHERE v.employer_id = e.employer_id AND es.price = 0
)
);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=12483578.55..12699754.67 rows=4951913 width=183) (actual time=402888.266..541201.083 rows=3456146 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=10227924, temp read=5404033 written=9746488
-> Parallel Hash Semi Join (cost=12483478.55..12650135.54 rows=1597391 width=183) (actual time=402879.754..403927.356 rows=864036 loops=4)
Hash Cond: (employer_manager.employer_id = v.employer_id)
Buffers: shared hit=10227924, temp read=5404033 written=9746488
-> Parallel Seq Scan on employer_manager (cost=0.00..61704.37 rows=3902987 width=183) (actual time=0.015..926.040 rows=3030730 loops=4)
Buffers: shared hit=226745
-> Parallel Hash (cost=12470696.28..12470696.28 rows=1173692 width=8) (actual time=400745.411..400757.615 rows=315704 loops=4)
Buckets: 1048576 Batches: 8 Memory Usage: 14432kB
Buffers: shared hit=10000933, temp read=5185403 written=9531604
-> Parallel Hash Semi Join (cost=5419572.63..12470696.28 rows=1173692 width=8) (actual time=381755.186..400673.065 rows=315704 loops=4)
Hash Cond: (e.employer_id = v.employer_id)
Buffers: shared hit=10000933, temp read=5185403 written=9527768
-> Parallel Seq Scan on employer e (cost=0.00..65876.37 rows=2986447 width=4) (actual time=0.015..547.631 rows=2313235 loops=4)
Buffers: shared hit=360119
-> Parallel Hash (cost=4389870.76..4389870.76 rows=94549378 width=4) (actual time=300367.731..300370.699 rows=10515574 loops=4)
Buckets: 1048576 (originally 1048576) Batches: 131072 (originally 512) Memory Usage: 287136kB
Buffers: shared hit=9640814, temp read=4068338 written=7441540
-> Parallel Hash Join (cost=1522656.32..4389870.76 rows=94549378 width=4) (actual time=113906.860..121650.966 rows=10515574 loops=4)
Hash Cond: (aspe.service_package_id = es.employer_service_id)
Buffers: shared hit=9640814, temp read=3570214 written=3573964
-> Parallel Hash Join (cost=1139337.57..3828356.61 rows=132999548 width=8) (actual time=87792.230..102392.631 rows=60107033 loops=4)
Hash Cond: (ase.account_service_package_entry_id = aspe.id)
Buffers: shared hit=8740464, temp read=2645196 written=2648224
-> Parallel Hash Join (cost=669912.02..3179027.68 rows=132999548 width=8) (actual time=44539.790..74472.198 rows=60107033 loops=4)
Hash Cond: (ase.object_id = v.vacancy_id)
Buffers: shared hit=7347306, temp read=1656073 written=1657576
-> Parallel Seq Scan on account_service_event ase (cost=0.00..2275263.13 rows=172006123 width=8) (actual time=0.011..20798.324 rows=133306300 loops=4)
Buffers: shared hit=5552019
-> Parallel Hash (cost=414267.15..414267.15 rows=23473845 width=8) (actual time=7923.194..7923.195 rows=18198520 loops=4)
Buckets: 1048576 Batches: 128 Memory Usage: 30528kB
Buffers: shared hit=1795287, temp written=247932
-> Parallel Seq Scan on vacancy v (cost=0.00..414267.15 rows=23473845 width=8) (actual time=0.011..5221.328 rows=18198520 loops=4)
Buffers: shared hit=1795287
-> Parallel Hash (cost=297333.92..297333.92 rows=15801812 width=8) (actual time=4352.140..4352.141 rows=12280833 loops=4)
Buckets: 1048576 Batches: 128 Memory Usage: 23296kB
Buffers: shared hit=1393158, temp written=167644
-> Parallel Seq Scan on account_service_package_entry aspe (cost=0.00..297333.92 rows=15801812 width=8) (actual time=0.015..2520.045 rows=12280833 loops=4)
Buffers: shared hit=1393158
-> Parallel Hash (cost=258846.24..258846.24 rows=11429325 width=4) (actual time=3424.467..3424.468 rows=8908616 loops=4)
Buckets: 1048576 Batches: 64 Memory Usage: 30080kB
Buffers: shared hit=900350, temp written=103252
-> Parallel Seq Scan on employer_service es (cost=0.00..258846.24 rows=11429325 width=4) (actual time=0.010..2087.361 rows=8908616 loops=4)
Filter: (price = 0)
Rows Removed by Filter: 3624132
Buffers: shared hit=900350
Planning:
Buffers: shared hit=236
Planning Time: 25.494 ms
Execution Time: 541359.085 ms
SELECT employer_id
FROM
employer e
WHERE
EXISTS (
SELECT 1
FROM employer_service es
JOIN account_service_package_entry aspe
ON (aspe.service_package_id = es.employer_service_id)
JOIN account_service_event ase
ON (ase.account_service_package_entry_id = aspe.id)
JOIN vacancy v
ON (ase.object_id = v.vacancy_id)
WHERE v.employer_id = e.employer_id AND es.price = 0
)
);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=12483578.55..12699754.67 rows=4951913 width=183) (actual time=402888.266..541201.083 rows=3456146 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=10227924, temp read=5404033 written=9746488
-> Parallel Hash Semi Join (cost=12483478.55..12650135.54 rows=1597391 width=183) (actual time=402879.754..403927.356 rows=864036 loops=4)
Hash Cond: (employer_manager.employer_id = v.employer_id)
Buffers: shared hit=10227924, temp read=5404033 written=9746488
-> Parallel Seq Scan on employer_manager (cost=0.00..61704.37 rows=3902987 width=183) (actual time=0.015..926.040 rows=3030730 loops=4)
Buffers: shared hit=226745
-> Parallel Hash (cost=12470696.28..12470696.28 rows=1173692 width=8) (actual time=400745.411..400757.615 rows=315704 loops=4)
Buckets: 1048576 Batches: 8 Memory Usage: 14432kB
Buffers: shared hit=10000933, temp read=5185403 written=9531604
-> Parallel Hash Semi Join (cost=5419572.63..12470696.28 rows=1173692 width=8) (actual time=381755.186..400673.065 rows=315704 loops=4)
Hash Cond: (e.employer_id = v.employer_id)
Buffers: shared hit=10000933, temp read=5185403 written=9527768
-> Parallel Seq Scan on employer e (cost=0.00..65876.37 rows=2986447 width=4) (actual time=0.015..547.631 rows=2313235 loops=4)
Buffers: shared hit=360119
-> Parallel Hash (cost=4389870.76..4389870.76 rows=94549378 width=4) (actual time=300367.731..300370.699 rows=10515574 loops=4)
Buckets: 1048576 (originally 1048576) Batches: 131072 (originally 512) Memory Usage: 287136kB
Buffers: shared hit=9640814, temp read=4068338 written=7441540
-> Parallel Hash Join (cost=1522656.32..4389870.76 rows=94549378 width=4) (actual time=113906.860..121650.966 rows=10515574 loops=4)
Hash Cond: (aspe.service_package_id = es.employer_service_id)
Buffers: shared hit=9640814, temp read=3570214 written=3573964
-> Parallel Hash Join (cost=1139337.57..3828356.61 rows=132999548 width=8) (actual time=87792.230..102392.631 rows=60107033 loops=4)
Hash Cond: (ase.account_service_package_entry_id = aspe.id)
Buffers: shared hit=8740464, temp read=2645196 written=2648224
-> Parallel Hash Join (cost=669912.02..3179027.68 rows=132999548 width=8) (actual time=44539.790..74472.198 rows=60107033 loops=4)
Hash Cond: (ase.object_id = v.vacancy_id)
Buffers: shared hit=7347306, temp read=1656073 written=1657576
-> Parallel Seq Scan on account_service_event ase (cost=0.00..2275263.13 rows=172006123 width=8) (actual time=0.011..20798.324 rows=133306300 loops=4)
Buffers: shared hit=5552019
-> Parallel Hash (cost=414267.15..414267.15 rows=23473845 width=8) (actual time=7923.194..7923.195 rows=18198520 loops=4)
Buckets: 1048576 Batches: 128 Memory Usage: 30528kB
Buffers: shared hit=1795287, temp written=247932
-> Parallel Seq Scan on vacancy v (cost=0.00..414267.15 rows=23473845 width=8) (actual time=0.011..5221.328 rows=18198520 loops=4)
Buffers: shared hit=1795287
-> Parallel Hash (cost=297333.92..297333.92 rows=15801812 width=8) (actual time=4352.140..4352.141 rows=12280833 loops=4)
Buckets: 1048576 Batches: 128 Memory Usage: 23296kB
Buffers: shared hit=1393158, temp written=167644
-> Parallel Seq Scan on account_service_package_entry aspe (cost=0.00..297333.92 rows=15801812 width=8) (actual time=0.015..2520.045 rows=12280833 loops=4)
Buffers: shared hit=1393158
-> Parallel Hash (cost=258846.24..258846.24 rows=11429325 width=4) (actual time=3424.467..3424.468 rows=8908616 loops=4)
Buckets: 1048576 Batches: 64 Memory Usage: 30080kB
Buffers: shared hit=900350, temp written=103252
-> Parallel Seq Scan on employer_service es (cost=0.00..258846.24 rows=11429325 width=4) (actual time=0.010..2087.361 rows=8908616 loops=4)
Filter: (price = 0)
Rows Removed by Filter: 3624132
Buffers: shared hit=900350
Planning:
Buffers: shared hit=236
Planning Time: 25.494 ms
Execution Time: 541359.085 ms
There are no big missestimations in rows, but during moments of high memory usage each parallel worker used over 5GB of RAM (so total over 20GB of RAM 3 worker + leader).
Memory context results of single worker:
TopMemoryContext: 3357032 total in 16 blocks; 46064 free (16 chunks); 3310968 used
...
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalHoldContext: 24632 total in 2 blocks; 7384 free (0 chunks); 17248 used
PortalContext: 540672 total in 69 blocks; 8568 free (10 chunks); 532104 used:
ExecutorState: 1082567568 total in 124072 blocks; 4073608 free (7018 chunks); 1078493960 used
HashTableContext: 74808 total in 5 blocks; 25008 free (3 chunks); 49800 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 74808 total in 5 blocks; 25056 free (3 chunks); 49752 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 40960 total in 3 blocks; 12328 free (6 chunks); 28632 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 4156883208 total in 124074 blocks; 146264 free (30 chunks); 4156736944 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 8192 total in 1 blocks; 4376 free (7 chunks); 3816 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
...
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Relcache by OID: 16384 total in 2 blocks; 1424 free (2 chunks); 14960 used
CacheMemoryContext: 1163704 total in 14 blocks; 197832 free (0 chunks); 965872 used
relation rules: 24576 total in 6 blocks; 4896 free (1 chunks); 19680 used: pg_settings
index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_2619_index
....
48 more child contexts containing 91136 total in 83 blocks; 27632 free (45 chunks); 63504 used
PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
MdSmgr: 8192 total in 1 blocks; 3880 free (2 chunks); 4312 used
LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
Grand total: 5245755200 bytes in 248596 blocks; 4995280 free (7224 chunks); 5240759920 used
...
TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalHoldContext: 24632 total in 2 blocks; 7384 free (0 chunks); 17248 used
PortalContext: 540672 total in 69 blocks; 8568 free (10 chunks); 532104 used:
ExecutorState: 1082567568 total in 124072 blocks; 4073608 free (7018 chunks); 1078493960 used
HashTableContext: 74808 total in 5 blocks; 25008 free (3 chunks); 49800 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 74808 total in 5 blocks; 25056 free (3 chunks); 49752 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 40960 total in 3 blocks; 12328 free (6 chunks); 28632 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 4156883208 total in 124074 blocks; 146264 free (30 chunks); 4156736944 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
HashTableContext: 8192 total in 1 blocks; 4376 free (7 chunks); 3816 used
HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
...
ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Relcache by OID: 16384 total in 2 blocks; 1424 free (2 chunks); 14960 used
CacheMemoryContext: 1163704 total in 14 blocks; 197832 free (0 chunks); 965872 used
relation rules: 24576 total in 6 blocks; 4896 free (1 chunks); 19680 used: pg_settings
index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_2619_index
....
48 more child contexts containing 91136 total in 83 blocks; 27632 free (45 chunks); 63504 used
PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
MdSmgr: 8192 total in 1 blocks; 3880 free (2 chunks); 4312 used
LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
Grand total: 5245755200 bytes in 248596 blocks; 4995280 free (7224 chunks); 5240759920 used
Also during query execution there was huge amount of small temp files:
find ~postgres/13/main/base/pgsql_tmp/ -type f | wc -l
1459383
1459383
There are two questions:
1)is this behavior normal/expected?
2)is there any way to limit memory usage of such queries to prevent OOM in random times
(memory usage of almost 200x of work_mem very surprised me).
PS: after crash recovery of a database with a few millions of files in pgsql_tmp takes ages (however it is fixed in newer versions so it's not a critical issue).
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
Senior Postgresql DBA
https://dataegret.com/
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678