Re: [PATCH 0/5] Handle oom bypass more gracefully

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 30 May 2016 13:35:04 +0200

On Mon 30-05-16 20:10:46, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > You are trying to make the OOM killer as per mm_struct operation. But
> > > I think we need to tolerate the OOM killer as per signal_struct operation.
> > 
> > Signal struct based approach is full of weird behavior which just leads
> > to corner cases. I think going mm struct way is the only sensible
> > approach.
> 
> I don't think so. What are corner cases the OOM reaper cannot handle with
> signal_struct based approach?

E.g. all the mm shared outside of the thread group with weird
inconsistencies crap.

> The OOM-killer decides based on "struct mm_struct" but it is a weakness of
> the OOM-killer that it cares only "struct mm_struct". It is possible that
> waiting for termination of only one thread releases a lot of memory (e.g.
> by closing pipe's file descriptor) and the OOM-killer needs to send SIGKILL
> to nobody.

How can a thread release pipe's memory when other threads are sharing
the same fd?

[...]
> Given that said, if everybody can agree with making the OOM-killer per
> "struct mm_struct" operation, I think reimplementing oom_disable_count which
> was removed by commit c9f01245b6a7d77d ("oom: remove oom_disable_count") (i.e.
> do not select an OOM victim unless all thread groups using that mm_struct is
> killable) seems to be better than ignoring what userspace told to do (i.e.
> select an OOM victim even if some thread groups using that mm_struct is not
> killable). Userspace knows the risk of setting OOM_SCORE_ADJ_MIN; it is a
> strong request like __GFP_NOFAIL allocation. We have global oom_lock which
> avoids race condition. Since writing to /proc/pid/oom_score_adj is not frequent,
> we can afford mutex_lock_killable(&oom_lock). We can interpret use_mm() request
> as setting OOM_SCORE_ADJ_MIN.

I am not really sure oom_lock is even needed. It is highly unlikely we
would race with an ongoing OOM killer. And even then the lock doesn't
bring much better semantic.

Regarding oom_disable_count, I think the current approach of
http://lkml.kernel.org/r/1464266415-15558-4-git-send-email-mhocko@xxxxxxxxxx
has one large advantage. The userspace can simply check the current
situation while any internal flag/counter/whatever hides that
implementation fact and so the userspace has no means to deal with it.

Sure, it can be argued that changing oom_score_adj behind process back
is nasty but we already do that for threads and nobody seems to
complain. Shared mm between processes is just a different model of
threading from the MM point of view. Or is this thinking wrong in
principle?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>