On Fri 01-03-19 19:30:54, Tetsuo Handa wrote: > On 2019/02/28 18:26, Michal Hocko wrote: > > We cannot do anything about the preemption so that is moot. ALLOC_OOM > > reserve is limited so the failure should happen sooner or later. But > > The problem is that preemption can slowdown ALLOC_OOM allocations (at e.g. > cond_resched() from direct reclaim path). Since concurrently allocating > threads can consume CPU time, the OOM reaper can fail to wait for the OOM > victim to complete (or fail) ALLOC_OOM allocations. But this is an inherent problem and we cannot do anything about it except for increasing the time the reaper keeps retrying. > > I would be OK to check for fatal_signal_pending once per pmd or so if > > that helps and it doesn't add a noticeable overhead. > > Another option is to scatter __GFP_NOMEMALLOC to allocations which might > be used from fork() path. This is not really maintainable. Page table allocations are used for other purposes as well, not to mention that each arch would have to do the same. Why don't you simply try the fatal_signal_panding per pmd for starter. Then we can tune the retry cound for the oom reaper. -- Michal Hocko SUSE Labs