On Wed, Jun 1, 2022 at 3:22 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, 1 Jun 2022 15:13:54 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > On Wed, Jun 1, 2022 at 3:09 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > The patch titled > > > Subject: mm-drop-oom-code-from-exit_mmap-fix > > > has been added to the -mm mm-unstable branch. Its filename is > > > mm-drop-oom-code-from-exit_mmap-fix.patch > > > > > > This patch will shortly appear at > > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-drop-oom-code-from-exit_mmap-fix.patch > > > > > > This patch will later appear in the mm-unstable branch at > > > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > > > > > Before you just go and hit "reply", please: > > > a) Consider who else should be cc'ed > > > b) Prefer to cc a suitable mailing list as well > > > c) Ideally: find the original patch on the mailing list and do a > > > reply-to-all to that, adding suitable additional cc's > > > > > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > > > > > The -mm tree is included into linux-next via the mm-everything > > > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > > and is updated there every 2-3 working days > > > > > > ------------------------------------------------------ > > > From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > Subject: mm-drop-oom-code-from-exit_mmap-fix > > > Date: Wed Jun 1 03:07:59 PM PDT 2022 > > > > > > fix it for presense of mapletree patchset, per Suren > > > > Andrew, could we also restore the mmap_lock_read/write retaking? It > > does make a difference for parallel memory reaping. > > If you want I can post a v3 with more explanation about that and also > > a separate patchset applied over maple-tree branch (if you have it > > separately). Should I? > > Like this? Almost. See below: > > > --- a/mm/mmap.c~mm-drop-oom-code-from-exit_mmap-fix-fix > +++ a/mm/mmap.c > @@ -3171,13 +3171,13 @@ void exit_mmap(struct mm_struct *mm) > /* mm's last user has gone, and its about to be pulled down */ > mmu_notifier_release(mm); > > - mmap_write_lock(mm); > + mmap_read_lock(mm); > arch_exit_mmap(mm); > > vma = mas_find(&mas, ULONG_MAX); > if (!vma) { > /* Can happen if dup_mmap() received an OOM */ > - mmap_write_unlock(mm); > + mmap_read_unlock(mm); > return; > } > > @@ -3187,6 +3187,7 @@ void exit_mmap(struct mm_struct *mm) > /* update_hiwater_rss(mm) here? but nobody should be looking */ > /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ > unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX); > + mmap_read_unlock(mm); > > /* > * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper > @@ -3194,6 +3195,7 @@ void exit_mmap(struct mm_struct *mm) > * mm_is_oom_victim because setting a bit unconditionally is cheaper. > */ > set_bit(MMF_OOM_SKIP, &mm->flags); > + mmap_write_unlock(mm); Instead of the line above ^^^ + + mmap_write_lock(mm); So the code should read as: ``` mmap_read_unlock(mm); /* * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper * because the memory has been already freed. Do not bother checking * mm_is_oom_victim because setting a bit unconditionally is cheaper. */ set_bit(MMF_OOM_SKIP, &mm->flags); mmap_write_lock(mm); free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, ... ``` > free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, > USER_PGTABLES_CEILING); > tlb_finish_mmu(&tlb); > _ > > > > void exit_mmap(struct mm_struct *mm) > { > struct mmu_gather tlb; > struct vm_area_struct *vma; > unsigned long nr_accounted = 0; > MA_STATE(mas, &mm->mm_mt, 0, 0); > int count = 0; > > /* mm's last user has gone, and its about to be pulled down */ > mmu_notifier_release(mm); > > mmap_read_lock(mm); > arch_exit_mmap(mm); > > vma = mas_find(&mas, ULONG_MAX); > if (!vma) { > /* Can happen if dup_mmap() received an OOM */ > mmap_read_unlock(mm); > return; > } > > lru_add_drain(); > flush_cache_mm(mm); > tlb_gather_mmu_fullmm(&tlb, mm); > /* update_hiwater_rss(mm) here? but nobody should be looking */ > /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ > unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX); > mmap_read_unlock(mm); > > /* > * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper > * because the memory has been already freed. Do not bother checking > * mm_is_oom_victim because setting a bit unconditionally is cheaper. > */ > set_bit(MMF_OOM_SKIP, &mm->flags); > mmap_write_unlock(mm); > free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, > USER_PGTABLES_CEILING); > tlb_finish_mmu(&tlb); > > /* > * Walk the list again, actually closing and freeing it, with preemption > * enabled, without holding any MM locks besides the unreachable > * mmap_write_lock. > */ > do { > if (vma->vm_flags & VM_ACCOUNT) > nr_accounted += vma_pages(vma); > remove_vma(vma); > count++; > cond_resched(); > } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL); > > BUG_ON(count != mm->map_count); > > trace_exit_mmap(mm); > __mt_destroy(&mm->mm_mt); > mmap_write_unlock(mm); > vm_unacct_memory(nr_accounted); > } >