Re: pg_restore out of memory

Sameer Kumar <sameer.kumar@xxxxxxxxxx> · Tue, 12 Jul 2016 13:08:02 +0000

On Tue, 12 Jul 2016, 7:25 p.m. Miguel Ramos, <org.postgresql@xxxxxxxxxxxxxxxxx> wrote:

Hi,

We have backed up a database and now when trying to restore it to the

same server we get this:

 > # pg_restore -d recovery /mnt/paysdeloire2013_convertida2.1.dump

 > pg_restore: [custom archiver] out of memory

 > 12:09:56.58  9446.593u+1218.508s 24.3%  167+2589k  6+0io  0pf+0sw

6968822cs

Some information about the application:

- We have sensor data, including pictures, and number crunshing output,

then so the large tables on this database have 319, 279, 111 and 26GB.

Mostly on TOAST pages, but the 279GB one divides it evenly. This

database is 765GB. We try to keep them under 4TB.

- Transactions are large, some 100 MB at a time.

- We also use PostGIS.

About the server (dedicated):

- FreeBSD 9.1-RELEASE #0 on AMD64

- 16 GB of RAM

- 8x3GB hardware RAID 10

- 10TB slice for pgdata UFS-formatted and 32kB block

- PostgreSQL 9.1.8 custom compiled to get 32kB blocks

- Installed in 2013 with ~10 people working with it, 145 days uptime today.

I found two relevant threads on the mailing-lists.

The most recent one sugested that postgresql was being configured to use

more memory than what's available.

The older one sugested that the system limits on the size of the data or

stack segments were lower than required.

So here are some server parameters (relevant or otherwise):

 > max_connections = 100

 > shared_buffers = 4GB  -- 25% of RAM

 > temp_buffers = 32MB  -- irrelevant?

 > work_mem = 64MB

 > maintenance_work_mem = was 1G lowered to 256M then 64M

Why did you lower it? I think increasing it should help better. But 1GB seems like fine.

 > wal_buffers = -1  -- should mean 1/32 of shared_buffers = 128MB

Increase this during the restore, may be 512MB

 > checkpoint_segments = 64  -- WAL segments are 16MB

 > effective_cache_size = 8GB  -- irrelevant?

I suspect that the restore fails when constructing the indices. After

the process is aborted, the data appears to be all or most there, but no

indices.

What is logged in database log files? Have you checked that?

So, all I did so far, was lowering maintenance_work_mem and it didn't work.

System limits, as you can see, are at defaults (32GB for data and 512MB

for stack):

 > # limit

 > cputime      unlimited

 > filesize     unlimited

 > datasize     33554432 kbytes

 > stacksize    524288 kbytes

 > coredumpsize unlimited

 > memoryuse    unlimited

 > vmemoryuse   unlimited

 > descriptors  11095

 > memorylocked unlimited

 > maxproc      5547

 > sbsize       unlimited

 > swapsize     unlimited

Shared memory is configured to allow for the single shared memory

segment postgresql appears to use, plus a bit of extra (8GB):

 > # ipcs -M

 > shminfo:

 >         shmmax:   8589934592    (max shared memory segment size)

 >         shmmin:            1    (min shared memory segment size)

 >         shmmni:          192    (max number of shared memory identifiers)

 >         shmseg:          128    (max shared memory segments per process)

 >         shmall:      2097152    (max amount of shared memory in pages)

And semaphores (irrelevant?)...

 > # ipcs -S

 > seminfo:

 >         semmni:          256    (# of semaphore identifiers)

 >         semmns:          512    (# of semaphores in system)

 >         semmnu:          256    (# of undo structures in system)

 >         semmsl:          340    (max # of semaphores per id)

 >         semopm:          100    (max # of operations per semop call)

 >         semume:           50    (max # of undo entries per process)

 >         semusz:          632    (size in bytes of undo structure)

 >         semvmx:        32767    (semaphore maximum value)

 >         semaem:        16384    (adjust on exit max value)

What are your vm.dirty_ratio and vm.dirty_background_ratio? I think reducing them may help. But can not really say what exactly would help unless you are able to get the error source in db logs

I don't know what else to try.

I lowered maintenance_work_mem without restarting the server.

In some of the attempts, but not all, the restore was done while people

were working.

Each attempt takes 12 hours...

We couldn't use the directory -Fd dump because it's postgresql 9.1.

The original database is still on the server, this is a test restore.

We have about one or two months of slack before we really need to remove

them from the server to recover space.

--

Miguel Ramos

--

Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-general

-- 
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com