On 10 Leden 2014, 19:19, Tom Lane wrote: > Preston Hagar <prestonh@xxxxxxxxx> writes: >>>> tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors >>>> despite the server now having 32 GB instead of 4 GB of RAM and the >>>> workload >>>> and number of clients remaining the same. > >> Here are a couple of examples from the incident we had this morning: >> 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for >> connection: Cannot allocate memory >> 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for >> connection: Cannot allocate memory > > That's odd ... ENOMEM from fork() suggests that you're under system-wide > memory pressure. > >> [ memory map dump showing no remarkable use of memory at all ] >> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production >> 10.1.1.6(36680)ERROR: out of memory >> 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production >> 10.1.1.6(36680)DETAIL: Failed on request of size 500. > > I think that what you've got here isn't really a Postgres issue, but > a system-level configuration issue: the kernel is being unreasonably > stingy about giving out memory, and it's not clear why. > > It might be worth double-checking that the postmaster is not being > started under restrictive ulimit settings; though offhand I don't > see how that theory could account for fork-time failures, since > the ulimit memory limits are per-process. > > Other than that, you need to burrow around in the kernel settings > and see if you can find something there that's limiting how much > memory it will give to Postgres. It might also be worth watching > the kernel log when one of these problems starts. Plain old "top" > might also be informative as to how much memory is being used. My bet is on overcommit - what are vm.overcommit_memory vm.overcommit_ratio set to? Do you have a swap or no? I've repeatedly ran into very similar OOM issues on machines with overcommit disabled (overcommit_memory=2) and with no swap. There was plenty of RAM available (either free or in page cache) but in case of sudden peak the allocations failed. Also vm.swappiness seems to play a role in this. >>> The weird thing is that our old server had 1/8th the RAM, was set to >>> max_connections = 600 and had the same clients connecting in the same >>> way >>> to the same databases and we never saw any errors like this in the >>> several >>> years we have been using it. Chances are the old machine had swap, overcommit and/or higher swappiness, so it was not running into these issues with overcommit. Anyway, I see you've mentioned shmmax/shmall in one of your previous messages. I'm pretty sure that's irrelevant to the problem, because that only affects allocation of shared buffers (i.e. shared memory). But if the database starts OK, the cause is somewhere else. kind regards Tomas Vondra -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general