Re: [patch -mm v2 04/10] oom: remove special handling for pagefault ooms

David Rientjes <rientjes@xxxxxxxxxx> · Mon, 1 Mar 2010 02:13:28 -0800 (PST)

On Mon, 1 Mar 2010, KAMEZAWA Hiroyuki wrote:

> On Fri, 26 Feb 2010 15:53:11 -0800 (PST)
> David Rientjes <rientjes@xxxxxxxxxx> wrote:
> 
> > It is possible to remove the special pagefault oom handler by simply
> > oom locking all system zones and then calling directly into
> > out_of_memory().
> > 
> > All populated zones must have ZONE_OOM_LOCKED set, otherwise there is a
> > parallel oom killing in progress that will lead to eventual memory
> > freeing so it's not necessary to needlessly kill another task.  The
> > context in which the pagefault is allocating memory is unknown to the oom
> > killer, so this is done on a system-wide level.
> > 
> > If a task has already been oom killed and hasn't fully exited yet, this
> > will be a no-op since select_bad_process() recognizes tasks across the
> > system with TIF_MEMDIE set.
> > 
> > The special handling to determine whether a parallel memcg is currently
> > oom is removed since we can detect future memory freeing with TIF_MEMDIE.
> > The memcg has already reached its memory limit, so it will still need to
> > kill a task regardless of the pagefault oom.
> > 
> > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> 
> NACK. please leave memcg's oom as it is. We're now rewriting.
> This is not core of your patch set. please skip.
> 

Your nack is completely unjustified, we're not going to stop oom killer 
development so memcg can catch up.  This patch allows pagefaults to go 
through the typical out_of_memory() interface so we don't have any 
ambiguity in how situations such as panic_on_oom are handled or whether 
current's memcg recently called the oom killer and it PREVENTS needlessly 
killing tasks when a parallel oom condition exists but a task hasn't been 
killed yet.

mem_cgroup_oom_called() is completely and utterly BOGUS since we can 
detect the EXACT same conditions via a tasklist scan filtered on current's 
memcg by looking for parallel oom kills, which out_of_memory() does, and 
locking the zonelists to prevent racing in calling out_of_memory() and 
actually setting the TIF_MEMDIE bit for the selected task.

You said earlier that you would wait for the next mmotm to be released and 
could easily rebase on my patchset and now you're stopping development 
entirely and allowing tasks to be needlessly oom killed via the old 
pagefault_out_of_memory() which does not synchronize on parallel oom 
kills.

I'm completely sure that you'll remove mem_cgroup_oom_called() entirely 
yourself since it doesn't do anything but encourage VM_FAULT_OOM loops 
itself, so please come up with some constructive criticism of my patch 
that Andrew can use to decide whether to merge my work or not instead of 
thinking you're the only one that can touch memcg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>