Re: Memory Overcommit

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 07 Jun 2012 12:14:30 -0400

Andy Chambers <achambers@xxxxxxxx> writes:
> We've just run into the dreaded "OOM Killer".  I see that on Linux
>> 2.6, it's recommended to turn off memory overcommit.  I'm trying to
> understand the implications of doing this.  The interweb says this
> means that forking servers can't make use of "copy on write"
> semantics.  Is this true?

Don't know where you read that, but it's nonsense AFAIK.

The actual issue here is that when a process fork()s, initially the
child shares all the pages of the parent process.  Over time, both the
child and the parent will dirty pages that had been shared, forcing a
copy-on-write to happen, after which there's a separate copy of such
pages for each process.  So if the parent had N pages, the ultimate
memory requirement will be for something between N and 2N pages, and
there's not a very good way to know in advance what it will be.

Now the problem the kernel has is, what if a COW needs to happen and it
has noplace to put the new page?  It cannot report an ENOMEM failure
because the process is not making a failable kernel call, it's just
writing some memory that it has every reason to think it can write.
About all the kernel can do is terminate that process, ie, OOM kill.

The only way to be certain an OOM kill cannot happen is if you reserve N
pages worth of memory/swap space for the child process when you do the
fork (since then you can fail the fork call, if there's not that much
available).  You can still do COW rather than physically duplicating the
whole address space right away, but you have to "bank" enough spare
space to be sure there will be room when and if the time comes.

"Overcommit" simply means that the kernel doesn't do such conservative
advance reservation, and so it might be forced into an OOM kill.

The downside of turning off overcommit is that you will have pretty
severe under-utilization of your memory, since in practice a lot of a
process's address space is read-only and can be shared indefinitely by
parent and child.  This can usually be alleviated by providing a lot of
swap space that you expect won't get used.  Of course, if your tuning
calculations are off and the swap does start getting used a lot,
performance goes to hell in a handbasket.  So it's a tradeoff --- do you
want to keep running but possibly slowly, or are you willing to cope
with OOM kills for better average utilization of your hardware?

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general