Re: OOM killer while pg_restore

Marc Rechté <marc4@xxxxxxxxx> · Thu, 3 Mar 2022 13:18:59 +0100

Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté <marc4@xxxxxxxxx> escreveu:

    Hello,

    We have a pg_restore which fails due to RAM over-consumption of
    the corresponding PG backend, which ends-up with OOM killer.

    The table has one PK, one index, and 3 FK constraints, active
    while restoring.
    The dump contains over 200M rows for that table and is in custom
    format, which corresponds to 37 GB of total relation size in the
    original DB.

    While importing, one can see the RSS + swap increasing linearly
    for the backend (executing the COPY)

    On my machine (quite old PC), it failed after 16 hours, while the
    disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap)

    If we do the same test, suppressing firstly the 5 constraints on
    the table, the restore takes less than 15 minutes !

    This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines).

    It there a memory leak or that is normal that a bacend process may
    exhaust the RAM to such an extent ?

Hi Marc,
Can you post the server logs?

regards,
Ranier Vilela

Will it help ?

2022-02-25 12:01:29.306 GMT [1468:24] user=,db=,app=,client= LOG:  
server process (PID 358995) was terminated by signal 9: Killed
2022-02-25 12:01:29.306 GMT [1468:25] user=,db=,app=,client= DETAIL:  
Failed process was running: COPY simulations_ecarts_relatifs_saison 
(idpoint, annee, saison, idreferentiel, ecartreltav, ecartreltnav, 
ecartreltxav, ecartreltrav, ecartreltxq90, ecartreltxq10, ecartreltnq10, 
ecartreltnq90, ecartreltxnd, ecartreltnnd, ecartreltnht, ecartreltxhwd, 
ecartreltncwd, ecartreltnfd, ecartreltxfd, ecartrelsd, ecartreltr, 
ecartrelhdd, ecartrelcdd, ecartrelpav, ecartrelpint, ecartrelrr, 
ecartrelpfl90, ecartrelrr1mm, ecartrelpxcwd, ecartrelpn20mm, 
ecartrelpxcdd, ecartrelhusav, ecartreltx35, ecartrelpq90, ecartrelpq99, 
ecartrelrr99, ecartrelffav, ecartrelff3, ecartrelffq98, ecartrelff98) 
FROM stdin;

2022-02-25 12:01:29.306 GMT [1468:26] user=,db=,app=,client= LOG: 
terminating any other active server processes
2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all 
server processes terminated; reinitializing
2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all 
server processes terminated; reinitializing
2022-02-25 12:01:29.326 GMT [360309:1] user=,db=,app=,client= LOG:  
database system was interrupted; last known up at 2022-02-25 12:01:12 GMT
2022-02-25 12:01:29.362 GMT [360310:1] 
user=[unknown],db=[unknown],app=[unknown],client=[local] LOG: connection 
received: host=[local]
2022-02-25 12:01:29.363 GMT [360310:2] 
user=postgres,db=drias,app=[unknown],client=[local] FATAL:  the database 
system is in recovery mode
2022-02-25 12:01:29.365 GMT [360309:2] user=,db=,app=,client= LOG:  
database system was not properly shut down; automatic recovery in progress
2022-02-25 12:01:29.367 GMT [360309:3] user=,db=,app=,client= LOG:  redo 
starts at C3/1E0D31F0
2022-02-25 12:01:40.845 GMT [360309:4] user=,db=,app=,client= LOG:  redo 
done at C3/6174BC00 system usage: CPU: user: 4.15 s, system: 1.40 s, 
elapsed: 11.47 s
2022-02-25 12:01:40.847 GMT [360309:5] user=,db=,app=,client= LOG:  
checkpoint starting: end-of-recovery immediate
2022-02-25 12:01:41.806 GMT [360309:6] user=,db=,app=,client= LOG:  
checkpoint complete: wrote 125566 buffers (100.0%); 0 WAL file(s) added, 
54 removed, 13 recycled; write=0.915 s, sync=0.001 s, total=0.960 s; 
sync files=10, longest=0.001 s, average=0.001 s; distance=1104355 kB, 
estimate=1104355 kB
2022-02-25 12:01:41.810 GMT [1468:28] user=,db=,app=,client= LOG: 
database system is ready to accept connections