The patch titled Subject: mm/memory.c: do_fault: avoid usage of stale vm_area_struct has been added to the -mm tree. Its filename is mm-memoryc-do_fault-avoid-usage-of-stale-vm_area_struct.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-memoryc-do_fault-avoid-usage-of-stale-vm_area_struct.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-memoryc-do_fault-avoid-usage-of-stale-vm_area_struct.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Jan Stancek <jstancek@xxxxxxxxxx> Subject: mm/memory.c: do_fault: avoid usage of stale vm_area_struct LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. This is a stress test, where one thread mmaps/writes/munmaps memory area and other thread is trying to read from it: CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) Call Trace: ([<0000000000000000>] (null)) [<00000000001adae4>] lock_acquire+0xec/0x258 [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 [<000000000012a780>] page_table_free+0x48/0x1a8 [<00000000002f6e54>] do_fault+0xdc/0x670 [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 page_table_free() is called with NULL mm parameter, but because "0" is a valid address on s390 (see S390_lowcore), it keeps going until it eventually crashes in lockdep's lock_acquire. This crash is reproducible at least since 4.14. Problem is that "vmf->vma" used in do_fault() can become stale. Because mmap_sem may be released, other threads can come in, call munmap() and cause "vma" be returned to kmem cache, and get zeroed/re-initialized and re-used: handle_mm_fault | __handle_mm_fault | do_fault | vma = vmf->vma | do_read_fault | __do_fault | vma->vm_ops->fault(vmf); | mmap_sem is released | | | do_munmap() | remove_vma_list() | remove_vma() | vm_area_free() | # vma is released | ... | # same vma is allocated | # from kmem cache | do_mmap() | vm_area_alloc() | memset(vma, 0, ...) | pte_free(vma->vm_mm, ...); | page_table_free | spin_lock_bh(&mm->context.lock);| <crash> | Cache mm_struct to avoid using potentially stale "vma". [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@xxxxxxxxxx Signed-off-by: Jan Stancek <jstancek@xxxxxxxxxx> Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Reviewed-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> Acked-by: Rafael Aquini <aquini@xxxxxxxxxx> Reviewed-by: Minchan Kim <minchan@xxxxxxxxxx> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Huang Ying <ying.huang@xxxxxxxxx> Cc: Souptick Joarder <jrdr.linux@xxxxxxxxx> Cc: Jerome Glisse <jglisse@xxxxxxxxxx> Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- --- a/mm/memory.c~mm-memoryc-do_fault-avoid-usage-of-stale-vm_area_struct +++ a/mm/memory.c @@ -3536,10 +3536,13 @@ static vm_fault_t do_shared_fault(struct * but allow concurrent faults). * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). + * If mmap_sem is released, vma may become invalid (for example + * by other thread calling munmap()). */ static vm_fault_t do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct mm_struct *vm_mm = vma->vm_mm; vm_fault_t ret; /* @@ -3580,7 +3583,7 @@ static vm_fault_t do_fault(struct vm_fau /* preallocated pagetable is unused: free it */ if (vmf->prealloc_pte) { - pte_free(vma->vm_mm, vmf->prealloc_pte); + pte_free(vm_mm, vmf->prealloc_pte); vmf->prealloc_pte = NULL; } return ret; _ Patches currently in -mm which might be from jstancek@xxxxxxxxxx are mm-memoryc-do_fault-avoid-usage-of-stale-vm_area_struct.patch