On 2018/09/12 17:17, Michal Hocko wrote: > On Wed 12-09-18 16:58:53, Tetsuo Handa wrote: >> Michal Hocko wrote: >>> OK, I will fold the following to the patch >> >> OK. But at that point, my patch which tries to wait for reclaimed memory >> to be re-allocatable addresses a different problem which you are refusing. > > I am trying to address a real world example of when the excessive amount > of memory is in page tables. As David pointed, this can happen with some > userspace allocators. My patch or David's patch will address it as well, without scattering down_write(&mm->mmap_sem)/up_write(&mm->mmap_sem) like your attempt. > >> By the way, is it guaranteed that vma->vm_ops->close(vma) in remove_vma() never >> sleeps? Since remove_vma() has might_sleep() since 2005, and that might_sleep() >> predates the git history, I don't know what that ->close() would do. > > Hmm, I am afraid we cannot assume anything so we have to consider it > unsafe. A cursory look at some callers shows that they are taking locks. > E.g. drm_gem_object_put_unlocked might take a mutex. So MMF_OOM_SKIP > would have to set right after releasing page tables. I won't be happy unless handed over section can run in atomic context (e.g. preempt_disable()/preempt_enable()) because current thread might be SCHED_IDLE priority. If current thread is SCHED_IDLE priority, it might be difficult to hand over because current thread is unlikely able to reach + if (oom) { + /* + * the exit path is guaranteed to finish without any unbound + * blocking at this stage so make it clear to the caller. + */ + mm->mmap = NULL; + up_write(&mm->mmap_sem); + } before the OOM reaper kernel thread (which is not SCHED_IDLE priority) checks whether mm->mmap is already NULL. Honestly, I'm not sure whether current thread (even !SCHED_IDLE priority) can reach there before the OOM killer checks whether mm->mmap is already NULL, for current thread has to do more things than the OOM reaper can do. Also, in the worst case, + /* + * oom_reaper cannot handle mlocked vmas but we + * need to serialize it with munlock_vma_pages_all + * which clears VM_LOCKED, otherwise the oom reaper + * cannot reliably test it. + */ + if (oom) + down_write(&mm->mmap_sem); would cause the OOM reaper to set MMF_OOM_SKIP without reclaiming any memory if munlock_vma_pages_all(vma) by current thread did not complete quick enough to make down_read_trylock(&mm->mmap_sem) attempt by the OOM reaper succeed.