Re: [PATCH] mm, oom: Tolerate processes sharing mm with different view of oom_score_adj.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 16-01-19 20:30:25, Tetsuo Handa wrote:
> On 2019/01/16 20:09, Michal Hocko wrote:
> > On Wed 16-01-19 19:55:21, Tetsuo Handa wrote:
> >> This patch reverts both commit 44a70adec910d692 ("mm, oom_adj: make sure
> >> processes sharing mm have same view of oom_score_adj") and commit
> >> 97fd49c2355ffded ("mm, oom: kill all tasks sharing the mm") in order to
> >> close a race and reduce the latency at __set_oom_adj(), and reduces the
> >> warning at __oom_kill_process() in order to minimize the latency.
> >>
> >> Commit 36324a990cf578b5 ("oom: clear TIF_MEMDIE after oom_reaper managed
> >> to unmap the address space") introduced the worst case mentioned in
> >> 44a70adec910d692. But since the OOM killer skips mm with MMF_OOM_SKIP set,
> >> only administrators can trigger the worst case.
> >>
> >> Since 44a70adec910d692 did not take latency into account, we can hold RCU
> >> for minutes and trigger RCU stall warnings by calling printk() on many
> >> thousands of thread groups. Even without calling printk(), the latency is
> >> mentioned by Yong-Taek Lee [1]. And I noticed that 44a70adec910d692 is
> >> racy, and trying to fix the race will require a global lock which is too
> >> costly for rare events.
> >>
> >> If the worst case in 44a70adec910d692 happens, it is an administrator's
> >> request. Therefore, tolerate the worst case and speed up __set_oom_adj().
> > 
> > I really do not think we care about latency. I consider the overal API
> > sanity much more important. Besides that the original report you are
> > referring to was never exaplained/shown to represent real world usecase.
> > oom_score_adj is not really a an interface to be tweaked in hot paths.
> 
> I do care about the latency. Holding RCU for more than 2 minutes is insane.

Creating 8k threads could be considered insane as well. But more
seriously. I absolutely do not insist on holding a single RCU section
for the whole operation. But that doesn't really mean that we want to
revert these changes. for_each_process is by far not only called from
this path.

-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux