Michal Hocko wrote: > > exit_mmap() does not block before set_bit(MMF_OOM_SKIP) once it is > > entered. > > Not true. munlock_vma_pages_all might take page_lock which can have > unpredictable dependences. This is the reason why we are ruling out > mlocked VMAs in the first place when reaping the address space. Wow! Then, > While you are correct, strictly speaking, because unmap_vmas can race > with the oom reaper. With the lock held during the whole operation we > can indeed trigger back off in the oom_repaer. It will keep retrying but > the tear down can take quite some time. This is a fair argument. On the > other hand your lock protocol introduces the MMF_OOM_SKIP problem I've > mentioned above and that really worries me. The primary objective of the > reaper is to guarantee a forward progress without relying on any > externalities. We might kill another OOM victim but that is safer than > lock up. current code has a possibility that the OOM reaper is disturbed by unpredictable dependencies, like I worried that I think that there is a possibility that the OOM reaper tries to reclaim mlocked pages as soon as exit_mmap() cleared VM_LOCKED flag by calling munlock_vma_pages_all(). when current approach was proposed. We currently have the MMF_OOM_SKIP problem. We need to teach the OOM reaper stop reaping as soon as entering exit_mmap(). Maybe let the OOM reaper poll for progress (e.g. none of get_mm_counter(mm, *) decreased for last 1 second) ?