On Tue, May 10, 2022 at 6:06 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 09-05-22 20:00:13, Suren Baghdasaryan wrote: > > With the oom-killer being able to operate on locked pages, exit_mmap > > does not need to ensure that oom_reap_task_mm is done before it can > > proceed. Instead it can rely on mmap_lock write lock to prevent > > oom-killer from operating on the vma tree while it's freeing page > > tables. exit_mmap can hold mmap_lock read lock when unmapping vmas > > and then take mmap_lock write lock before freeing page tables. > > The changelog is rather light on nasty details which might be good but > for the sake of our future us let's be more verbose so that we do not > have to reinvent the prior history each time we are looking into this > code. I would go with something like this instead: > " > The primary reason to invoke the oom reaper from the exit_mmap path used > to be a prevention of an excessive oom killing if the oom victim exit > races with the oom reaper (see 212925802454 ("mm: oom: let oom_reap_task > and exit_mmap run concurrently") for more details. The invocation has > moved around since then because of the interaction with the munlock > logic but the underlying reason has remained the same (see 27ae357fa82b > ("mm, oom: fix concurrent munlock and oom reaper unmap, v3"). > > Munlock code is no longer a problem since a213e5cf71cb ("mm/munlock: > delete munlock_vma_pages_all(), allow oomreap") and there shouldn't be > any blocking operation before the memory is unmapped by exit_mmap so > the oom reaper invocation can be dropped. The unmapping part can be done > with the non-exclusive mmap_sem and the exclusive one is only required > when page tables are freed. > > Remove the oom_reaper from exit_mmap which will make the code easier to > read. This is really unlikely to make any observable difference although > some microbenchmarks could benefit from one less branch that needs to be > evaluated even though it almost never is true. > " Looks great! Thanks for collecting all the history. Will update the description. > > One minor comment below. Other than that \o/ this is finally going away. > I strongly suspect that the history of this code is a nice example about how > over optimizing code can cause more harm than good. > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> Thanks. > > Thanks! > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > --- > > include/linux/oom.h | 2 -- > > mm/mmap.c | 25 ++++++------------------- > > mm/oom_kill.c | 2 +- > > 3 files changed, 7 insertions(+), 22 deletions(-) > > > [...] > > @@ -3138,6 +3121,10 @@ void exit_mmap(struct mm_struct *mm) > > /* update_hiwater_rss(mm) here? but nobody should be looking */ > > /* Use -1 here to ensure all VMAs in the mm are unmapped */ > > unmap_vmas(&tlb, vma, 0, -1); > > + mmap_read_unlock(mm); > > + /* Set MMF_OOM_SKIP to disregard this mm from further consideration.*/ > > + set_bit(MMF_OOM_SKIP, &mm->flags); > > I think that it would be slightly more readable to add an empty line > above and below of this. Also the comment would be more helpful if it > explaind what the further consideration actually means. I would go with > > /* > * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper > * because the memory has been already freed. Do not bother > * checking mm_is_oom_victim because setting a bit > * unconditionally is just cheaper. > */ > Ack. > > + mmap_write_lock(mm); > > free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > > tlb_finish_mmu(&tlb); > > -- > Michal Hocko > SUSE Labs