david@xxxxxxx wrote:
On Wed, 27 Aug 2008, Craig James wrote:
The OOM killer is a terrible idea for any serious database server. I
wrote a detailed technical paper on this almost 15 years ago when
Silicon Graphics had this same feature, and Oracle and other critical
server processes couldn't be made reliable.
The problem with "overallocating memory" as Linux does by default is
that EVERY application, no matter how well designed and written,
becomes unreliable: It can be killed because of some OTHER process.
You can be as clever as you like, and do all the QA possible, and
demonstrate that there isn't a single bug in Postgres, and it will
STILL be unreliable if you run it on a Linux system that allows
overcommitted memory.
IMHO, all Postgres servers should run with memory-overcommit
disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2.
it depends on how much stuff you allow others to run on the box. if you
have no control of that then yes, the box is unreliable (but it's not
just becouse of the OOM killer, it's becouse those other users can eat
up all the other box resources as well CPU, network bandwidth, disk
bandwidth, etc)
even with overcommit disabled, the only way you can be sure that a
program will not fail is to make sure that it never needs to allocate
memory. with overcommit off you could have one program that eats up 100%
of your ram without failing (handling the error on memory allocation
such that it doesn't crash), but which will cause _every_ other program
on the system to fail, including any scripts (becouse every command
executed will require forking and without overcommit that will require
allocating the total memory that your shell has allocated so that it can
run a trivial command (like ps or kill that you are trying to use to fix
the problem)
if you have a box with unpredictable memory use, disabling overcommit
will not make it reliable. it may make it less unreliable (the fact that
the linux OOM killer will pick one of the worst possible processes to
kill is a problem), but less unreliable is not the same as reliable.
The problem with any argument in favor of memory overcommit and OOM is that there is a MUCH better, and simpler, solution. Buy a really big disk, say a terabyte, and allocate the whole thing as swap space. Then do a decent job of configuring your kernel so that any reasonable process can allocate huge chunks of memory that it will never use, but can't use the whole terrabyte.
Using real swap space instead of overallocated memory is a much better solution.
- It's cheap.
- There is no performance hit at all if you buy enough real memory
- If runaway processes start actually using memory, the system slows
down, but server processes like Postgres *aren't killed*.
- When a runaway process starts everybody swapping, you can just
find it and kill it. Once it's dead, everything else goes back
to normal.
It's hard to imagine a situation where any program or collection of programs would actually try to allocate more than a terrabyte of memory and exceed the swap space on a single terrabyte disk. The cost is almost nothing, a few hundred dollars.
So turn off overcommit, and buy an extra disk if you actually need a lot of "virtual memory".
Craig