Michal Hocko wrote: > On Fri 19-05-17 22:02:44, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > Any allocation failure during the #PF path will return with VM_FAULT_OOM > > > which in turn results in pagefault_out_of_memory. This can happen for > > > 2 different reasons. a) Memcg is out of memory and we rely on > > > mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) > > > normal allocation fails. > > > > > > The later is quite problematic because allocation paths already trigger > > > out_of_memory and the page allocator tries really hard to not fail > > > > We made many memory allocation requests from page fault path (e.g. XFS) > > __GFP_FS some time ago, didn't we? But if I recall correctly (I couldn't > > find the message), there are some allocation requests from page fault path > > which cannot use __GFP_FS. Then, not all allocation requests can call > > oom_kill_process() and reaching pagefault_out_of_memory() will be > > inevitable. > > Even if such an allocation fail without the OOM killer then we simply > retry the PF and will do that the same way how we keep retrying the > allocation inside the page allocator. So how is this any different? You are trying to remove out_of_memory() from pagefault_out_of_memory() by this patch. But you also want to make !__GFP_FS allocations not to keep retrying inside the page allocator in future kernels, don't you? Then, a thread which need to allocate memory from page fault path but cannot call oom_kill_process() will spin forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request). I consider that introducing such possibility is a problem. > > > > allocations. Anyway, if the OOM killer has been already invoked there > > > is no reason to invoke it again from the #PF path. Especially when the > > > OOM condition might be gone by that time and we have no way to find out > > > other than allocate. > > > > > > Moreover if the allocation failed and the OOM killer hasn't been > > > invoked then we are unlikely to do the right thing from the #PF context > > > because we have already lost the allocation context and restictions and > > > therefore might oom kill a task from a different NUMA domain. > > > > If we carry a flag via task_struct that indicates whether it is an memory > > allocation request from page fault and allocation failure is not acceptable, > > we can call out_of_memory() from page allocator path. > > I do not understand We need to allocate memory from page fault path in order to avoid spinning forever (unless somebody else calls oom_kill_process() via a __GFP_FS allocation request), doesn't it? Then, memory allocation requests from page fault path can pass flags like __GFP_NOFAIL | __GFP_KILLABLE because retrying the page fault without allocating memory is pointless. I called such flags as carry a flag via task_struct. > > By the way, can page fault occur after reaching do_exit()? When a thread > > reached do_exit(), fatal_signal_pending(current) becomes false, doesn't it? > > yes fatal_signal_pending will be false at the time and I believe we can > perform a page fault past that moment and go via allocation path which would > trigger the OOM or give this task access to reserves but it is more > likely that the oom reaper will push to kill another task by that time > if the situation didn't get resolved. Or did I miss your concern? How checking fatal_signal_pending() here helps? It only suppresses printk(). If current thread needs to allocate memory because not all allocation requests can call oom_kill_process(), doing printk() is not the right thing to do. Allocate memory by some means (e.g. __GFP_NOFAIL | __GFP_KILLABLE) will be the right thing to do. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>