Re: [PATCH 0/5] Handle oom bypass more gracefully

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 30 May 2016 20:10:46 +0900

Michal Hocko wrote:
> > You are trying to make the OOM killer as per mm_struct operation. But
> > I think we need to tolerate the OOM killer as per signal_struct operation.
> 
> Signal struct based approach is full of weird behavior which just leads
> to corner cases. I think going mm struct way is the only sensible
> approach.

I don't think so. What are corner cases the OOM reaper cannot handle with
signal_struct based approach?

The OOM-killer decides based on "struct mm_struct" but it is a weakness of
the OOM-killer that it cares only "struct mm_struct". It is possible that
waiting for termination of only one thread releases a lot of memory (e.g.
by closing pipe's file descriptor) and the OOM-killer needs to send SIGKILL
to nobody. From point of view of least killing, trying to wait for exiting
task_struct is better than needlessly killing the entire thread groups using
some mm_struct. The problem of per task_struct approach is that we have no
trigger to give up waiting for that thread if that thread seems to got stuck.

Commit 98748bd722005be9 ("oom: consider multi-threaded tasks in
task_will_free_mem") changed from per task_struct approach to per signal_struct
approach. And I think that current situation is reasonable because signal_struct
is a unit for reacting to SIGKILL. If somebody implements userspace OOM-killer
(maybe lowmemory killer?), current situation allows such OOM-killer not to worry
about OOM_SCORE_ADJ_MIN or use_mm(). It is still possible that waiting for
termination of only one thread group releases a lot of memory. The problem here
is that we have no trigger to give up waiting for that thread group if that
thread group seems to got stuck. But it is trivial to use the OOM-reaper as a
trigger to give up.

Given that said, if everybody can agree with making the OOM-killer per
"struct mm_struct" operation, I think reimplementing oom_disable_count which
was removed by commit c9f01245b6a7d77d ("oom: remove oom_disable_count") (i.e.
do not select an OOM victim unless all thread groups using that mm_struct is
killable) seems to be better than ignoring what userspace told to do (i.e.
select an OOM victim even if some thread groups using that mm_struct is not
killable). Userspace knows the risk of setting OOM_SCORE_ADJ_MIN; it is a
strong request like __GFP_NOFAIL allocation. We have global oom_lock which
avoids race condition. Since writing to /proc/pid/oom_score_adj is not frequent,
we can afford mutex_lock_killable(&oom_lock). We can interpret use_mm() request
as setting OOM_SCORE_ADJ_MIN.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>