Re: + mm-drop-oom-code-from-exit_mmap-fix.patch added to mm-unstable branch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 1, 2022 at 3:22 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 1 Jun 2022 15:13:54 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> > On Wed, Jun 1, 2022 at 3:09 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > > The patch titled
> > >      Subject: mm-drop-oom-code-from-exit_mmap-fix
> > > has been added to the -mm mm-unstable branch.  Its filename is
> > >      mm-drop-oom-code-from-exit_mmap-fix.patch
> > >
> > > This patch will shortly appear at
> > >      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-drop-oom-code-from-exit_mmap-fix.patch
> > >
> > > This patch will later appear in the mm-unstable branch at
> > >     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > >
> > > Before you just go and hit "reply", please:
> > >    a) Consider who else should be cc'ed
> > >    b) Prefer to cc a suitable mailing list as well
> > >    c) Ideally: find the original patch on the mailing list and do a
> > >       reply-to-all to that, adding suitable additional cc's
> > >
> > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> > >
> > > The -mm tree is included into linux-next via the mm-everything
> > > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > > and is updated there every 2-3 working days
> > >
> > > ------------------------------------------------------
> > > From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Subject: mm-drop-oom-code-from-exit_mmap-fix
> > > Date: Wed Jun  1 03:07:59 PM PDT 2022
> > >
> > > fix it for presense of mapletree patchset, per Suren
> >
> > Andrew, could we also restore the mmap_lock_read/write retaking? It
> > does make a difference for parallel memory reaping.
> > If you want I can post a v3 with more explanation about that and also
> > a separate patchset applied over maple-tree branch (if you have it
> > separately). Should I?
>
> Like this?

Almost. See below:

>
>
> --- a/mm/mmap.c~mm-drop-oom-code-from-exit_mmap-fix-fix
> +++ a/mm/mmap.c
> @@ -3171,13 +3171,13 @@ void exit_mmap(struct mm_struct *mm)
>         /* mm's last user has gone, and its about to be pulled down */
>         mmu_notifier_release(mm);
>
> -       mmap_write_lock(mm);
> +       mmap_read_lock(mm);
>         arch_exit_mmap(mm);
>
>         vma = mas_find(&mas, ULONG_MAX);
>         if (!vma) {
>                 /* Can happen if dup_mmap() received an OOM */
> -               mmap_write_unlock(mm);
> +               mmap_read_unlock(mm);
>                 return;
>         }
>
> @@ -3187,6 +3187,7 @@ void exit_mmap(struct mm_struct *mm)
>         /* update_hiwater_rss(mm) here? but nobody should be looking */
>         /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
>         unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX);
> +       mmap_read_unlock(mm);
>
>         /*
>          * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
> @@ -3194,6 +3195,7 @@ void exit_mmap(struct mm_struct *mm)
>          * mm_is_oom_victim because setting a bit unconditionally is cheaper.
>          */
>         set_bit(MMF_OOM_SKIP, &mm->flags);
> +       mmap_write_unlock(mm);

Instead of the line above ^^^
+
+       mmap_write_lock(mm);

So the code should read as:
```
        mmap_read_unlock(mm);

        /*
        * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
        * because the memory has been already freed. Do not bother checking
        * mm_is_oom_victim because setting a bit unconditionally is cheaper.
        */
        set_bit(MMF_OOM_SKIP, &mm->flags);

        mmap_write_lock(mm);
        free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
...
```


>         free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
>                       USER_PGTABLES_CEILING);
>         tlb_finish_mmu(&tlb);
> _
>
>
>
> void exit_mmap(struct mm_struct *mm)
> {
>         struct mmu_gather tlb;
>         struct vm_area_struct *vma;
>         unsigned long nr_accounted = 0;
>         MA_STATE(mas, &mm->mm_mt, 0, 0);
>         int count = 0;
>
>         /* mm's last user has gone, and its about to be pulled down */
>         mmu_notifier_release(mm);
>
>         mmap_read_lock(mm);
>         arch_exit_mmap(mm);
>
>         vma = mas_find(&mas, ULONG_MAX);
>         if (!vma) {
>                 /* Can happen if dup_mmap() received an OOM */
>                 mmap_read_unlock(mm);
>                 return;
>         }
>
>         lru_add_drain();
>         flush_cache_mm(mm);
>         tlb_gather_mmu_fullmm(&tlb, mm);
>         /* update_hiwater_rss(mm) here? but nobody should be looking */
>         /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
>         unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX);
>         mmap_read_unlock(mm);
>
>         /*
>          * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
>          * because the memory has been already freed. Do not bother checking
>          * mm_is_oom_victim because setting a bit unconditionally is cheaper.
>          */
>         set_bit(MMF_OOM_SKIP, &mm->flags);
>         mmap_write_unlock(mm);
>         free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS,
>                       USER_PGTABLES_CEILING);
>         tlb_finish_mmu(&tlb);
>
>         /*
>          * Walk the list again, actually closing and freeing it, with preemption
>          * enabled, without holding any MM locks besides the unreachable
>          * mmap_write_lock.
>          */
>         do {
>                 if (vma->vm_flags & VM_ACCOUNT)
>                         nr_accounted += vma_pages(vma);
>                 remove_vma(vma);
>                 count++;
>                 cond_resched();
>         } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL);
>
>         BUG_ON(count != mm->map_count);
>
>         trace_exit_mmap(mm);
>         __mt_destroy(&mm->mm_mt);
>         mmap_write_unlock(mm);
>         vm_unacct_memory(nr_accounted);
> }
>



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux