On Thu, 28 Aug 2008, Craig James wrote:
If your processes do use the memory, then your performance goes into the toilet, and you know it's time to buy more memory or a second server, but in the mean time your server processes at least keep running while you kill the rogue processes.
I'd argue against swap ALWAYS being better than overcommit. It's a choice between your performance going into the toilet or your processes dieing.
On the one hand, if someone fork-bombs you, the OOM killer has a chance of solving the problem for you, rather than you having to log onto an unresponsive machine to kill the process yourself. On the other hand, the OOM killer may kill the wrong thing. Depending on what else you use your machine for, either of the choices may be the right one.
Another point is that from a business perspective, a database that has stopped responding is equally bad regardless of whether that is because the OOM killer has appeared or because the machine is thrashing. In both cases, there is a maximum throughput that the machine can handle, and if requests appear quicker than that the system will collapse, especially if the requests start timing out and being retried.
This problem really is caused by the kernel not having enough information on how much memory a process is going to use. I would be much in favour of replacing fork() with some more informative system call. For example, forkandexec() instead of fork() then exec() - the kernel would know that the new process will never need any of that duplicated RAM. However, there is *far* too much legacy in the old fork() call to change that now.
Likewise, I would be all for Postgres managing its memory better. It would be very nice to be able to set a maximum amount of work-memory, rather than a maximum amount per backend. Each backend could then make do with however much is left of the work-memory pool when it actually executes queries. As it is, the server admin has no idea how many multiples of work-mem are going to be actually used, even knowing the maximum number of backends.
Matthew -- Of course it's your fault. Everything here's your fault - it says so in your contract. - Quark