Re: Question about memory usage

Preston Hagar <prestonh@xxxxxxxxx> · Fri, 10 Jan 2014 14:33:46 -0600

On Fri, Jan 10, 2014 at 12:19 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

Preston Hagar <prestonh@xxxxxxxxx> writes:

>>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors

>>> despite the server now having 32 GB instead of 4 GB of RAM and the workload

>>> and number of clients remaining the same.

> Here are a couple of examples from the incident we had this morning:

> 2014-01-10 06:14:40 CST  30176    LOG:  could not fork new process for

> connection: Cannot allocate memory

> 2014-01-10 06:14:40 CST  30176    LOG:  could not fork new process for

> connection: Cannot allocate memory

That's odd ... ENOMEM from fork() suggests that you're under system-wide

memory pressure.

> [ memory map dump showing no remarkable use of memory at all ]

> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production

>  10.1.1.6(36680)ERROR:  out of memory

> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production

>  10.1.1.6(36680)DETAIL:  Failed on request of size 500.

I think that what you've got here isn't really a Postgres issue, but

a system-level configuration issue: the kernel is being unreasonably

stingy about giving out memory, and it's not clear why.

It might be worth double-checking that the postmaster is not being

started under restrictive ulimit settings; though offhand I don't

see how that theory could account for fork-time failures, since

the ulimit memory limits are per-process.

Other than that, you need to burrow around in the kernel settings

and see if you can find something there that's limiting how much

memory it will give to Postgres.  It might also be worth watching

the kernel log when one of these problems starts.  Plain old "top"

might also be informative as to how much memory is being used.

   Thanks for the response.  I think it might have been the lack of a swapfile (I replied as such in another response)

That said, we have been using this site as a guide to try to figure things out about postgres and memory:

http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/

we came up with the following for all our current processes (we aren't out of memory and new connections are being accepted right now, but memory seems low)

1. List of RSS usage for all postgres processes:

http://pastebin.com/J7vy846k

2. List of all memory segments for postgres checkpoint process (pid 30178)

grep -B1 -E '^Size: *[0-9]{6}' /proc/30178/smaps
7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473                   /dev/zero (deleted)
Size:            8067312 kB

3. Info on largest memory allocation for postgres checkpoint process. It is using 5GB of RAM privately.

cat /proc/30178/smaps | grep 7f208acec000 -B 0 -A 20

Total RSS: 11481148

7f208acec000-7f2277328000 rw-s 00000000 00:04 31371473                   /dev/zero (deleted)
Size:            8067312 kB
Rss:             5565828 kB
Pss:             5284432 kB

Shared_Clean:          0 kB
Shared_Dirty:     428840 kB
Private_Clean:         0 kB
Private_Dirty:   5136988 kB
Referenced:      5559624 kB
Anonymous:             0 kB

AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
7f2277328000-7f22775f1000 r--p 00000000 09:00 2889301                    /usr/lib/locale/locale-archive

Size:               2852 kB
Rss:                   8 kB
Pss:                   0 kB
Shared_Clean:          8 kB
Shared_Dirty:          0 kB

If I am understanding all this correctly, the postgres checkpoint process has around 5GB of RAM "Private_Dirty" allocated (not shared buffers).  Is this normal?  Any thoughts as to why this would get so high?

I'm still trying to dig in further to figure out exactly.  We are running on Ubuntu 12.04.3 (Kernel 3.5.0-44).  We set vm.overcommit_memory = 2 but didn't have a swap partition we have since added one and are seeing if that helps.

>> We had originally copied our shared_buffers, work_mem, wal_buffers and

>> other similar settings from our old config, but after getting the memory

>> errors have tweaked them to the following:

>

> shared_buffers            = 7680MB

> temp_buffers              = 12MB

> max_prepared_transactions = 0

> work_mem                  = 80MB

> maintenance_work_mem      = 1GB

> wal_buffers = 8MB

> max_connections = 350

That seems like a dangerously large work_mem for so many connections;

but unless all the connections were executing complex queries, which

doesn't sound to be the case, that isn't the immediate problem.

Thanks for the heads up.  We had come about the value originally using pgtune and I think 250 connections and I forgot to lower work_mem when I upped the connections.  I now have it set to 45 MB, does that seem more reasonable?

>> The weird thing is that our old server had 1/8th the RAM, was set to

>> max_connections = 600 and had the same clients connecting in the same way

>> to the same databases and we never saw any errors like this in the several

>> years we have been using it.

This reinforces the impression that something's misconfigured at the

kernel level on the new server.

                        regards, tom lane

Forgot to copy the list on the reply, so I am here.

Thanks for your help and time.

Preston