Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks

David Rientjes <rientjes@xxxxxxxxxx> · Thu, 9 Sep 2010 14:40:19 -0700 (PDT)

On Thu, 9 Sep 2010, Dave Hansen wrote:

> > Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
> > getting cleared by something else before you use zram.  The sysctl should 
> > default to on as of 2.6.36-rc1.
> 
> I double-checked.  It defaults to on and remains that way.
> 

Ok, I assume you aren't getting the typical "cat invoked oom-killer..." 
message, the memory state dump, etc., either, so there's something strange 
with your log level such that nothing under KERN_WARNING is getting 
through or you can't access the actual kernel log due to the panic.  I can 
capture all that information with a netdump on panic with 2.6.36-rc3.

> I'll give the patch a shot and see if I get any better behavior.  But, I
> really do think the root cause here is compcache exhausting the system
> when you feed incompressible pages into it.  We can kill all the tasks
> we want, but I think it'll continue to gobble memory up as fast as we
> free it.
> 

That certainly seems to be the case and is the true topic of this thread, 
so I don't want to hijack it any further since it's outside the scope of 
the oom killer :)

But I'm still curious as to why the machine is hanging and not eventually 
panicking when we run out of killable tasks.  It seems as though something 
is hanging in the exit path, meaning memory reserves aren't even safe from 
compcache, or there's something wrong in the oom killer retry logic, or 
you're simply forking more tasks, perhaps as a response to threads getting 
killed by the kernel, than we can kill.

We'd certainly prefer to panic the machine if no work is getting done than 
simply killing everything that gets forked.  The problem before was that 
we panicked too early before we killed anything and now we don't know when 
to panic appropriately.

> > Agreed, we'll need to address hugepages specifically because they don't 
> > get accounted for in rss but do free memory when the task is killed.
> 
> They do sometimes.  But, if they're preallocated, or stuck in a linked
> file on the filesystem, killing the task doesn't do any good.
> 

Indeed you're right, I meant s/hugepages/transparent hugepages/, sorry.  
It appears as though they get included in the rss of the allocating task, 
though, via MM_ANONPAGES, so this is already represented in the task's 
badness score.

Thanks for trying the patch out, Dave, I hope we can add your Tested-by 
line and it can get pushed to the rc-series.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>