Re: Swap Space and vm.oom_kill_allocating_task

Andrew Sullivan <ajs@xxxxxxxxxxxxxxxxx> · Fri, 2 May 2008 10:55:41 -0400

On Fri, May 02, 2008 at 05:46:53PM +0300, Volkan YAZICI wrote:
> In our current structure, responsiveness has the
> highest priority and thus it is ok for us to cancel queries at that
> instant and re-initiate connections. To achieve this effect, I started
> to turn swap space off on some of the servers and turned
> vm.oom_kill_allocating_task kernel parameter on. (Periodical postgres
> process availability checks decides whether there is a need to fire up a
> fresh postgres instance.) So far, this method worked pretty well but I'm
> suspicious about data corruptions. (Disks configurations are set to RAID
> 10.) What are the downsides of such a design scheme?

One big problem is that the OOM killer will quite possibly decide to
kill the postmaster daemon process as opposed to any children.  The
children don't necessarily die in that case.  If you start up a new
postmaster at this point, you will corrupt your data almost certainly.

Why are you allowing memory overcommit at all?  And what is causing
you to swap?  I think those are the things you need to fix.

A
-- 
Andrew Sullivan
ajs@xxxxxxxxxxxxxxxxx
+1 503 667 4564 x104
http://www.commandprompt.com/