On Tue 08-03-16 19:59:15, Tetsuo Handa wrote: > out_of_memory() can stall effectively forever if a SCHED_IDLE thread > called out_of_memory() when there are !SCHED_IDLE threads running on > the same CPU, for schedule_timeout_killable(1) cannot return shortly > due to scheduling priority on CONFIG_PREEMPT_NONE=y kernels. > > Operations with oom_lock held should complete as soon as possible > because we might be preserving OOM condition for most of that period > if we are in OOM condition. SysRq-f can't work if oom_lock is held. > > It would be possible to boost scheduling priority of current thread > while holding oom_lock, but priority of current thread might be > manipulated by other threads after boosting. Unless we offload > operations with oom_lock held to a dedicated kernel thread with high > priority, addressing this problem using priority manipulation is racy. As already mentioned whe you posted a similar patch before, this all is true but the patch doesn't really fixes it so do we really want to have it in the patch description? > This patch brings schedule_timeout_killable(1) out of oom_lock. I am not against this change, though, but can we please have a changelog which doesn't pretend to be a fix while it is not? Also the changelog should tell us why oom vs. exit races are not more probable now because the OOM killing context is basically throttling all others right now which won't be the case anymore. While I believe this shouldn't matter it should be considered and described. > This patch does not address OOM notifiers which are blockable. Also does it really make sense to post the patch alone without addressing this part? > Long term we should focus on making the OOM context not preemptible. > > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> my s-o-b is not really necessary, I have merely tried to show how your previous patch could be simplified. > --- > mm/oom_kill.c | 14 +++++++------- > mm/page_alloc.c | 7 +++++++ > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 5d5eca9..c84e784 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -901,15 +901,9 @@ bool out_of_memory(struct oom_control *oc) > dump_header(oc, NULL, NULL); > panic("Out of memory and no killable processes...\n"); > } > - if (p && p != (void *)-1UL) { > + if (p && p != (void *)-1UL) > oom_kill_process(oc, p, points, totalpages, NULL, > "Out of memory"); > - /* > - * Give the killed process a good chance to exit before trying > - * to allocate memory again. > - */ > - schedule_timeout_killable(1); > - } > return true; > } > > @@ -944,4 +938,10 @@ void pagefault_out_of_memory(void) > } > > mutex_unlock(&oom_lock); > + > + /* > + * Give the killed process a good chance to exit before trying > + * to allocate memory again. > + */ > + schedule_timeout_killable(1); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 1993894..378a346 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2888,6 +2888,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, > } > out: > mutex_unlock(&oom_lock); > + if (*did_some_progress && !page) { > + /* > + * Give the killed process a good chance to exit before trying > + * to allocate memory again. > + */ > + schedule_timeout_killable(1); > + } > return page; > } > > -- > 1.8.3.1 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>