Re: [patch -mm] memcg: make oom killer a no-op when no killable task can be found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 7 Apr 2010, KAMEZAWA Hiroyuki wrote:

> > > oom-badness-heuristic-rewrite.patch
> > 
> > Do you have any specific feedback that you could offer on why you decided 
> > to nack this?
> > 
> 
> I like this patch. But I think no one can't Ack this because there is no
> "correct" answer. At least, this show good behavior on my environment.
> 

Agreed.  I think the new oom_badness() function is much better than the 
current heuristic and should prevent X from being killed as we've 
discussed fairly often on LKML over the past six months.

> > Keeping /proc/pid/oom_adj around indefinitely isn't very helpful if 
> > there's a finer grained alternative available already unless you want 
> > /proc/pid/oom_adj to actually mean something in which case you'll never be 
> > able to seperate oom badness scores from bitshifts.  I believe everyone 
> > agrees that a more understood and finer grained tunable is necessary as 
> > compared to the current implementation that has very limited functionality 
> > other than polarizing tasks.
> > 
> 
> If oom-badness-heuristic-rewrite.patch will go ahead, this should go.
> But my concern is administorator has to check all oom_score_adj and
> tune it again if he adds more memory to the system.
> 
> Now, not-small amount of people use Virtual Machine or Contaienr. So, this
> oom_score_adj's sensivity to the size of memory can put admins to hell.
> 

Would you necessarily want to change oom_score_adj when you add or remove 
memory?  I see the currently available pool of memory available (whether 
it is system-wide, constrained to a cpuset mems, mempolicy nodes, or memcg 
limits) as a shared resource so if you want to bias a task by 25% of 
available memory by using an oom_score_adj of 250, that doesn't change if 
we add or remove memory.  It still means that the task should be biased by 
that amount in comparison to other tasks.

My perspective is that we should define oom killing priorities is terms of 
how much memory tasks are using compared to others and that the actual 
capacity itself is irrelevant if its a shared resource.  So when tasks are 
moved into a memcg, for example, that becomes a "virtualized system" with 
a more limited shared memory resource and has the same bias (or 
preference) that it did when it was in the root cgroup.

In other words, I think it would be more inconvenient to update 
oom_score_adj anytime a task changes memcg, is attached to a different 
cpuset, or is bound to nodes by way of a mempolicy.  In these scenarios, I 
see them as simply having a restricted set of allowed memory yet the bias 
can remain the same.

Users who do actually want to bias a task by a memory quantity can easily 
do so, but I think they would be in the minority and we hope to avoid 
adding unnecessary tunables when a conversion to the appropriate 
oom_score_adj value is possible with a simple divide.

> > > oom-replace-sysctls-with-quick-mode.patch
> > > 
> > > IIRC, alan and nick and I NAKed such patch. everybody explained the reason.
> > 
> > Which patch of the four you listed are you referring to here?
> > 
> replacing used sysctl is bad idea, in general.
> 

I agree, but since the audience for both of these sysctls will need to do 
echo 0 > /proc/sys/vm/oom_dump_tasks as the result of this patchset since 
it is now enabled by default, do you think we can take this as an 
opportunity to consolidate them down into one?  Otherwise, we're obliged 
to continue to support them indefinitely even though their only users are 
the exact same systems.

> I have no _strong_ opinion. I welcome the patch series. But aboves are my concern.
> Thank you for your work.
> 

Thanks, Kame, I appreciate that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]