Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Thu, 19 Apr 2018 19:45:46 +0900

Michal Hocko wrote:
> > exit_mmap() does not block before set_bit(MMF_OOM_SKIP) once it is 
> > entered.
> 
> Not true. munlock_vma_pages_all might take page_lock which can have
> unpredictable dependences. This is the reason why we are ruling out
> mlocked VMAs in the first place when reaping the address space.

Wow! Then,

> While you are correct, strictly speaking, because unmap_vmas can race
> with the oom reaper. With the lock held during the whole operation we
> can indeed trigger back off in the oom_repaer. It will keep retrying but
> the tear down can take quite some time. This is a fair argument. On the
> other hand your lock protocol introduces the MMF_OOM_SKIP problem I've
> mentioned above and that really worries me. The primary objective of the
> reaper is to guarantee a forward progress without relying on any
> externalities. We might kill another OOM victim but that is safer than
> lock up.

current code has a possibility that the OOM reaper is disturbed by
unpredictable dependencies, like I worried that

  I think that there is a possibility that the OOM reaper tries to reclaim
  mlocked pages as soon as exit_mmap() cleared VM_LOCKED flag by calling
  munlock_vma_pages_all().

when current approach was proposed. We currently have the MMF_OOM_SKIP problem.
We need to teach the OOM reaper stop reaping as soon as entering exit_mmap().
Maybe let the OOM reaper poll for progress (e.g. none of get_mm_counter(mm, *)
decreased for last 1 second) ?