On Thu 02-06-22 13:31:27, Liam Howlett wrote: > * Michal Hocko <mhocko@xxxxxxxx> [220602 02:53]: > > On Wed 01-06-22 14:47:41, Suren Baghdasaryan wrote: > > > On Wed, Jun 1, 2022 at 2:36 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > [...] > > > > But iirc mapletree wants to retain a write_lock here, so I ended up with > > > > > > > > void exit_mmap(struct mm_struct *mm) > > > > { > > > > struct mmu_gather tlb; > > > > struct vm_area_struct *vma; > > > > unsigned long nr_accounted = 0; > > > > MA_STATE(mas, &mm->mm_mt, 0, 0); > > > > int count = 0; > > > > > > > > /* mm's last user has gone, and its about to be pulled down */ > > > > mmu_notifier_release(mm); > > > > > > > > mmap_write_lock(mm); > > > > arch_exit_mmap(mm); > > > > > > > > vma = mas_find(&mas, ULONG_MAX); > > > > if (!vma) { > > > > /* Can happen if dup_mmap() received an OOM */ > > > > mmap_write_unlock(mm); > > > > return; > > > > } > > > > > > > > lru_add_drain(); > > > > flush_cache_mm(mm); > > > > tlb_gather_mmu_fullmm(&tlb, mm); > > > > /* update_hiwater_rss(mm) here? but nobody should be looking */ > > > > /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */ > > > > unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX); > > > > > > > > /* > > > > * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper > > > > * because the memory has been already freed. Do not bother checking > > > > * mm_is_oom_victim because setting a bit unconditionally is cheaper. > > > > */ > > > > set_bit(MMF_OOM_SKIP, &mm->flags); > > > > free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, > > > > USER_PGTABLES_CEILING); > > > > tlb_finish_mmu(&tlb); > > > > > > > > /* > > > > * Walk the list again, actually closing and freeing it, with preemption > > > > * enabled, without holding any MM locks besides the unreachable > > > > * mmap_write_lock. > > > > */ > > > > do { > > > > if (vma->vm_flags & VM_ACCOUNT) > > > > nr_accounted += vma_pages(vma); > > > > remove_vma(vma); > > > > count++; > > > > cond_resched(); > > > > } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL); > > > > > > > > BUG_ON(count != mm->map_count); > > > > > > > > trace_exit_mmap(mm); > > > > __mt_destroy(&mm->mm_mt); > > > > mm->mmap = NULL; > > > > > > ^^^ this line above needs to be removed when the patch is applied over > > > the maple tree patchset. > > > > I am not fully up to date on the maple tree changes. Could you explain > > why resetting mm->mmap is not needed anymore please? > > The maple tree patch set removes the linked list, including mm->mmap. > The call to __mt_destroy() means none of the old VMAs can be found in > the race condition that mm->mmap = NULL was solving. Thanks for the clarification, Liam. -- Michal Hocko SUSE Labs