The patch titled Subject: mm: document locking restrictions for vm_operations_struct::close has been added to the -mm tree. Its filename is mm-document-locking-restrictions-for-vm_operations_struct-close.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-document-locking-restrictions-for-vm_operations_struct-close.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-document-locking-restrictions-for-vm_operations_struct-close.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Suren Baghdasaryan <surenb@xxxxxxxxxx> Subject: mm: document locking restrictions for vm_operations_struct::close Patch series "mm: protect free_pgtables with mmap_lock write lock in exit_mmap", v4. oom-reaper and process_mrelease system call should protect against races with exit_mmap which can destroy page tables while they walk the VMA tree. oom-reaper protects from that race by setting MMF_OOM_VICTIM and by relying on exit_mmap to set MMF_OOM_SKIP before taking and releasing mmap_write_lock. process_mrelease has to elevate mm->mm_users to prevent such race. Both oom-reaper and process_mrelease hold mmap_read_lock when walking the VMA tree. The locking rules and mechanisms could be simpler if exit_mmap takes mmap_write_lock while executing destructive operations such as free_pgtables. Change exit_mmap to hold the mmap_write_lock when calling free_pgtables and remove_vma. Operations like unmap_vmas and unlock_range are not destructive and could run under mmap_read_lock but for simplicity we take one mmap_write_lock during almost the entire operation. Note also that because oom-reaper checks VM_LOCKED flag, unlock_range() should not be allowed to race with it. Before this patch, remove_vma used to be called with no locks held, however with fput being executed asynchronously and vm_ops->close not being allowed to hold mmap_lock (it is called from __split_vma with mmap_sem held for write), changing that should be fine. In most cases this lock should be uncontended. Previously, Kirill reported ~4% regression caused by a similar change [1]. We reran the same test and although the individual results are quite noisy, the percentiles show lower regression with 1.6% being the worst case [2]. The change allows oom-reaper and process_mrelease to execute safely under mmap_read_lock without worries that exit_mmap might destroy page tables from under them. [1] https://lore.kernel.org/all/20170725141723.ivukwhddk2voyhuc@xxxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/all/CAJuCfpGC9-c9P40x7oy=jy5SphMcd0o0G_6U1-+JAziGKG6dGA@xxxxxxxxxxxxxx/ This patch (of 2): Add comments for vm_operations_struct::close documenting locking requirements for this callback and its callers. Link: https://lkml.kernel.org/r/20211208212211.2860249-1-surenb@xxxxxxxxxx Link: https://lkml.kernel.org/r/20211208212211.2860249-2-surenb@xxxxxxxxxx Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Roman Gushchin <guro@xxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Christian Brauner <christian@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> Cc: Florian Weimer <fweimer@xxxxxxxxxx> Cc: Jan Engelhardt <jengelh@xxxxxxx> Cc: Tim Murray <timmurray@xxxxxxxxxx> Cc: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+) --- a/include/linux/mm.h~mm-document-locking-restrictions-for-vm_operations_struct-close +++ a/include/linux/mm.h @@ -532,6 +532,10 @@ enum page_entry_size { */ struct vm_operations_struct { void (*open)(struct vm_area_struct * area); + /** + * @close: Called when the VMA is being removed from the MM. + * Context: User context. May sleep. Caller holds mmap_lock. + */ void (*close)(struct vm_area_struct * area); /* Called any time before splitting to check if it's allowed */ int (*may_split)(struct vm_area_struct *area, unsigned long addr); _ Patches currently in -mm which might be from surenb@xxxxxxxxxx are mm-add-a-field-to-store-names-for-private-anonymous-memory-fix.patch mm-add-anonymous-vma-name-refcounting.patch mm-document-locking-restrictions-for-vm_operations_struct-close.patch mm-oom_kill-allow-process_mrelease-to-run-under-mmap_lock-protection.patch sysctl-change-watermark_scale_factor-max-limit-to-30%.patch