On Fri 09-12-16 23:23:10, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Thu 08-12-16 00:29:26, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > On Tue 06-12-16 19:33:59, Tetsuo Handa wrote: > > > > > If the OOM killer is invoked when many threads are looping inside the > > > > > page allocator, it is possible that the OOM killer is preempted by other > > > > > threads. > > > > > > > > Hmm, the only way I can see this would happen is when the task which > > > > actually manages to take the lock is not invoking the OOM killer for > > > > whatever reason. Is this what happens in your case? Are you able to > > > > trigger this reliably? > > > > > > Regarding http://I-love.SAKURA.ne.jp/tmp/serial-20161206.txt.xz , > > > somebody called oom_kill_process() and reached > > > > > > pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n", > > > > > > line but did not reach > > > > > > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > > > > > > line within tolerable delay. > > > > I would be really interested in that. This can happen only if > > find_lock_task_mm fails. This would mean that either we are selecting a > > child without mm or the selected victim has no mm anymore. Both cases > > should be ephemeral because oom_badness will rule those tasks on the > > next round. So the primary question here is why no other task has hit > > out_of_memory. > > This can also happen due to AB-BA livelock (oom_lock v.s. console_sem). Care to explain how would that livelock look like? > > Have you tried to instrument the kernel and see whether > > GFP_NOFS contexts simply preempted any other attempt to get there? > > I would find it quite unlikely but not impossible. If that is the case > > we should really think how to move forward. One way is to make the oom > > path fully synchronous as suggested below. Other is to tweak GFP_NOFS > > some more and do not take the lock while we are evaluating that. This > > sounds quite messy though. > > Do you mean "tweak GFP_NOFS" as something like below patch? > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3036,6 +3036,17 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...) > > *did_some_progress = 0; > > + if (!(gfp_mask & (__GFP_FS | __GFP_NOFAIL))) { > + if ((current->flags & PF_DUMPCORE) || > + (order > PAGE_ALLOC_COSTLY_ORDER) || > + (ac->high_zoneidx < ZONE_NORMAL) || > + (pm_suspended_storage()) || > + (gfp_mask & __GFP_THISNODE)) > + return NULL; > + *did_some_progress = 1; > + return NULL; > + } > + > /* > * Acquire the oom lock. If that fails, somebody else is > * making progress for us. > > Then, serial-20161209-gfp.txt in http://I-love.SAKURA.ne.jp/tmp/20161209.tar.xz is > console log with above patch applied. Spinning without invoking the OOM killer. > It did not avoid locking up. OK, so the reason of the lock up must be something different. If we are really {dead,live}locking on the printk because of warn_alloc then that path should be tweaked instead. Something like below should rule this out: --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ed65d7df72d5..c2ba51cec93d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3024,11 +3024,14 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...) unsigned int filter = SHOW_MEM_FILTER_NODES; struct va_format vaf; va_list args; + static DEFINE_MUTEX(warn_lock); if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs) || debug_guardpage_minorder() > 0) return; + mutex_lock(&warn_lock); + /* * This documents exceptions given to allocations in certain * contexts that are allowed to allocate outside current's set @@ -3054,6 +3057,8 @@ void warn_alloc(gfp_t gfp_mask, const char *fmt, ...) dump_stack(); if (!should_suppress_show_mem()) show_mem(filter); + + mutex_unlock(&warn_lock); } static inline struct page * -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>