On Wed 16-03-16 20:16:47, Tetsuo Handa wrote: > Michal Hocko wrote: [...] > > And just to prevent from a confusion. I mean waking up also when > > fatal_signal_pending and we do not really go down to selecting an oom > > victim. Which would be worth a separate patch on top of course. > > I couldn't understand this part. The shortcut > > if (current->mm && > (fatal_signal_pending(current) || task_will_free_mem(current))) { > mark_oom_victim(current); > return true; > } > > is not used for !__GFP_FS && !__GFP_NOFAIL allocation requests. I think > we might go down to selecting an oom victim by out_of_memory() calls by > not-yet-killed processes. I meant something like the following. It would need some more tweaks of course but here is the idea at least. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 23b8b06152be..09e54bc0976c 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -686,6 +686,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, task_lock(p); if (p->mm && task_will_free_mem(p)) { mark_oom_victim(p); + wake_oom_reaper(p); task_unlock(p); put_task_struct(p); return; @@ -869,10 +870,22 @@ bool out_of_memory(struct oom_control *oc) if (current->mm && (fatal_signal_pending(current) || task_will_free_mem(current))) { mark_oom_victim(current); + wake_oom_reaper(current); return true; } /* + * XXX: This is a weak reclaim context when FS metadata couldn't be + * reclaimed and so triggering the OOM killer could be really pre + * mature at this point. Traditionally have been looping in the page + * allocator and hoping for somebody else to make a forward progress + * for us. It would be better to simply fail those requests but we + * are not yet there so keep the tradition + */ + if (!(gfp_mask & __GFP_FS)) + return true; + + /* * Check if there were limitations on the allocation (only relevant for * NUMA) that may require different handling. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d4d574dd0408..01121a89eb52 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2854,20 +2854,11 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, /* The OOM killer does not needlessly kill tasks for lowmem */ if (ac->high_zoneidx < ZONE_NORMAL) goto out; - /* The OOM killer does not compensate for IO-less reclaim */ - if (!(gfp_mask & __GFP_FS)) { - /* - * XXX: Page reclaim didn't yield anything, - * and the OOM killer can't be invoked, but - * keep looping as per tradition. - * - * But do not keep looping if oom_killer_disable() - * was already called, for the system is trying to - * enter a quiescent state during suspend. - */ - *did_some_progress = !oom_killer_disabled; - goto out; - } + /* + * TODO once we are able to cope with GFP_NOFS allocation + * failures more gracefully just return and fail the allocation + * rather than trigger OOM + */ if (pm_suspended_storage()) goto out; /* The OOM killer may not free memory on a specific node */ -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>