Re: zram OOM behavior

Luigi Semenzato <semenzato@xxxxxxxxxx> · Tue, 30 Oct 2012 23:28:49 -0700

On Tue, Oct 30, 2012 at 11:14 PM, Luigi Semenzato <semenzato@xxxxxxxxxx> wrote:
> On Tue, Oct 30, 2012 at 9:46 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote:
>> On Tue, 30 Oct 2012, Luigi Semenzato wrote:
>>
>>> Actually, there is a very simple fix:
>>>
>>> @@ -355,14 +364,6 @@ static struct task_struct
>>> *select_bad_process(unsigned int *ppoints,
>>>                         if (p == current) {
>>>                                 chosen = p;
>>>                                 *ppoints = 1000;
>>> -                       } else if (!force_kill) {
>>> -                               /*
>>> -                                * If this task is not being ptraced on exit,
>>> -                                * then wait for it to finish before killing
>>> -                                * some other task unnecessarily.
>>> -                                */
>>> -                               if (!(p->group_leader->ptrace & PT_TRACE_EXIT))
>>> -                                       return ERR_PTR(-1UL);
>>>                         }
>>>                 }
>>>
>>> I'd rather kill some other task unnecessarily than hang!  My load
>>> works fine with this change.
>>>
>>
>> That's not an acceptable "fix" at all, it will lead to unnecessarily
>> killing processes when others are in the exit path, i.e. every oom kill
>> would kill two or three or more processes instead of just one.
>
> I am sorry, I didn't mean to suggest that this is the right fix for
> everybody.  It seems to work for us.  A real fix would be much harder,
> I think.  Certainly it would be for me.
>
> We don't rely on OOM-killing for memory management (we tried to, but
> it has drawbacks).  But OOM kills can still happen, so we have to deal
> with them.  We can deal with multiple processes being killed, but not
> with a hang.  I might be tempted to say that this should be true for
> everybody, but I can imagine systems that work by allowing only one
> process to die, and perhaps the load on those systems is such that
> they don't experience this deadlock often, or ever (even though I
> would be nervous about it).

To make it clear, I am suggesting that this "fix" might work as a
temporary workaround until a better fix is available.

>> Could you please try this on 3.6 since all the code you're quoting is from
>> old kernels?
>
> I will see if I can do it, but we're shipping 3.4 and I am not sure
> about the status of our 3.6 tree.  I will also visually inspect the
> relevant 3.6 code and see if the possibility of deadlock is still
> there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>