Hello Michel, I altered the vma_adjust code and it's triggering what looks like to be a false positive in vma_rb_erase->validate_mm_rb with CONFIG_DEBUG_VM_RB=y. So what happens is normally remove_next == 1 or == 2, and set vma->vm_end to next->vm_end and then call validate_mm_rb(next) and it passes and then unlink "next" (removed from vm_next/prev and rbtree). I introduced a new case to fix a bug remove_next == 3 that actually removes "vma" and sets next->vm_start = vma->vm_start. So the old code was always doing: vma->vm_end = next->vm_end vma_rb_erase(next) // in __vma_unlink vma->vm_next = next->vm_next // in __vma_unlink next = vma->vm_next vma_gap_update(next) The new code still does the above for remove_next == 1 and 2, but for remove_next ==3 it has been changed and it does: next->vm_start = vma->vm_start vma_rb_erase(vma) // in __vma_unlink vma_gap_update(next) However it bugs out in vma_rb_erase(vma) because next->vm_start was reduced. However I tend to think what I'm executing is correct. It's pointless to call vma_gap_update before I can call vm_rb_erase anyway so certainly I can't fix it that way. I'm forced to remove "vma" from the rbtree before I can call vma_gap_update(next). So I did other tests: diff --git a/mm/mmap.c b/mm/mmap.c index 27f0509..a38c8a0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -400,15 +400,9 @@ static inline void vma_rb_insert(struct vm_area_struct *vma, rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks); } -static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) +static void __vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) { /* - * All rb_subtree_gap values must be consistent prior to erase, - * with the possible exception of the vma being erased. - */ - validate_mm_rb(root, vma); - - /* * Note rb_erase_augmented is a fairly large inline function, * so make sure we instantiate it only once with our desired * augmented rbtree callbacks. @@ -416,6 +410,18 @@ static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks); } +static __always_inline void vma_rb_erase(struct vm_area_struct *vma, + struct rb_root *root) +{ + /* + * All rb_subtree_gap values must be consistent prior to erase, + * with the possible exception of the vma being erased. + */ + validate_mm_rb(root, vma); + + __vma_rb_erase(vma, root); +} + /* * vma has some anon_vma assigned, and is already inserted on that * anon_vma's interval trees. @@ -606,7 +612,10 @@ static __always_inline void __vma_unlink_common(struct mm_struct *mm, { struct vm_area_struct *next; - vma_rb_erase(vma, &mm->mm_rb); + if (has_prev) + vma_rb_erase(vma, &mm->mm_rb); + else + __vma_rb_erase(vma, &mm->mm_rb); next = vma->vm_next; if (has_prev) prev->vm_next = next; @@ -892,9 +901,11 @@ again: end = next->vm_end; goto again; } - else if (next) + else if (next) { vma_gap_update(next); - else + if (remove_next == 3) + validate_mm_rb(&mm->mm_rb, next); + } else mm->highest_vm_end = end; } if (insert && file) The above shifts the validate_mm_rb(next) for the remove_next == 3 case from before the rb_removal of "vma" to after vma_gap_update is called on "next". This works fine. So if you agree this is a false positive of CONFIG_DEBUG_MM_RB and there was no actual bug, I just suggest to shut off the warning by telling validate_mm_rb not to ignore the vma that is being removed but the next one, if the next->vm_start was reduced to overlap over the vma that is being removed. This shut off the warning just fine for me and it leaves the validation in place and always enabled. Just it skips the check on the next vma that was updated instead of the one that is being removed if it was the next one that had next->vm_start reduced. On a side note I also noticed "mm->highest_vm_end = end" is erroneous, it should be VM_WARN_ON(mm->highest_vm_end != end) but that's offtopic. So this would be the patch I'd suggest to shut off the false positive, it's a noop when CONFIG_DEBUG_VM_RB=n. >From fc256d7f71cd6295a5258387c0cb2af9134d16a2 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <aarcange@xxxxxxxxxx> Date: Tue, 20 Sep 2016 15:01:33 +0200 Subject: [PATCH 1/1] mm: vma_merge: correct false positive from __vma_unlink->validate_mm_rb The old code was always doing: vma->vm_end = next->vm_end vma_rb_erase(next) // in __vma_unlink vma->vm_next = next->vm_next // in __vma_unlink next = vma->vm_next vma_gap_update(next) The new code still does the above for remove_next == 1 and 2, but for remove_next == 3 it has been changed and it does: next->vm_start = vma->vm_start vma_rb_erase(vma) // in __vma_unlink vma_gap_update(next) In the latter case, while unlinking "vma", validate_mm_rb() is told to ignore "vma" that is being removed, but next->vm_start was reduced instead. So for the new case, to avoid the false positive from validate_mm_rb, it should be "next" that is ignored when "vma" is being unlinked. "vma" and "next" in the above comment, considered pre-swap(). Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> --- mm/mmap.c | 62 ++++++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 18 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 52d77a9..b8db812 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -398,15 +398,9 @@ static inline void vma_rb_insert(struct vm_area_struct *vma, rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks); } -static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) +static void __vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) { /* - * All rb_subtree_gap values must be consistent prior to erase, - * with the possible exception of the vma being erased. - */ - validate_mm_rb(root, vma); - - /* * Note rb_erase_augmented is a fairly large inline function, * so make sure we instantiate it only once with our desired * augmented rbtree callbacks. @@ -414,6 +408,32 @@ static void vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root) rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks); } +static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma, + struct rb_root *root, + struct vm_area_struct *ignore) +{ + /* + * All rb_subtree_gap values must be consistent prior to erase, + * with the possible exception of the "next" vma being erased if + * next->vm_start was reduced. + */ + validate_mm_rb(root, ignore); + + __vma_rb_erase(vma, root); +} + +static __always_inline void vma_rb_erase(struct vm_area_struct *vma, + struct rb_root *root) +{ + /* + * All rb_subtree_gap values must be consistent prior to erase, + * with the possible exception of the vma being erased. + */ + validate_mm_rb(root, vma); + + __vma_rb_erase(vma, root); +} + /* * vma has some anon_vma assigned, and is already inserted on that * anon_vma's interval trees. @@ -600,11 +620,15 @@ static void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) static __always_inline void __vma_unlink_common(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, - bool has_prev) + bool has_prev, + struct vm_area_struct *ignore) { struct vm_area_struct *next; - vma_rb_erase(vma, &mm->mm_rb); + if (has_prev) + vma_rb_erase_ignore(vma, &mm->mm_rb, ignore); + else + vma_rb_erase_ignore(vma, &mm->mm_rb, ignore); next = vma->vm_next; if (has_prev) prev->vm_next = next; @@ -626,13 +650,7 @@ static inline void __vma_unlink_prev(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev) { - __vma_unlink_common(mm, vma, prev, true); -} - -static inline void __vma_unlink(struct mm_struct *mm, - struct vm_area_struct *vma) -{ - __vma_unlink_common(mm, vma, NULL, false); + __vma_unlink_common(mm, vma, prev, true, vma); } /* @@ -811,8 +829,16 @@ again: if (remove_next != 3) __vma_unlink_prev(mm, next, vma); else - /* vma is not before next if they've been swapped */ - __vma_unlink(mm, next); + /* + * vma is not before next if they've been + * swapped. + * + * pre-swap() next->vm_start was reduced so + * tell validate_mm_rb to ignore pre-swap() + * "next" (which is stored in post-swap() + * "vma"). + */ + __vma_unlink_common(mm, next, NULL, false, vma); if (file) __remove_shared_vm_struct(next, file, mapping); } else if (insert) { ----- Forwarded message from kernel test robot <xiaolong.ye@xxxxxxxxx> ----- Date: Tue, 20 Sep 2016 19:11:26 +0800 From: kernel test robot <xiaolong.ye@xxxxxxxxx> To: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: lkp@xxxxxx Subject: [mm] 0331ab667f: kernel BUG at mm/mmap.c:327! User-Agent: Heirloom mailx 12.5 6/20/10 FYI, we noticed the following commit: https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git master commit 0331ab667f082a781b9380cac1461dcca0515bc4 ("mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk") in testcase: trinity with following parameters: runtime: 300s Trinity is a linux system call fuzz tester. on test machine: qemu-system-x86_64 -enable-kvm -cpu IvyBridge -m 360M caused below changes: +------------------------------------------+------------+------------+ | | 7da550f576 | 0331ab667f | +------------------------------------------+------------+------------+ | boot_successes | 18 | 12 | | boot_failures | 4 | 10 | | invoked_oom-killer:gfp_mask=0x | 4 | | | Mem-Info | 4 | | | kernel_BUG_at_mm/mmap.c | 0 | 10 | | invalid_opcode:#[##]PREEMPT | 0 | 10 | | RIP:validate_mm_rb | 0 | 10 | | calltrace:SyS_mprotect | 0 | 9 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 10 | +------------------------------------------+------------+------------+ [ 40.690337] pgoff 0 file ffff8800111b2000 private_data (null) [ 40.690337] flags: 0xfb(read|write|shared|mayread|maywrite|mayexec|mayshare) [ 40.700682] ------------[ cut here ]------------ [ 40.701451] kernel BUG at mm/mmap.c:327! [ 40.702391] invalid opcode: 0000 [#1] PREEMPT [ 40.703087] CPU: 0 PID: 364 Comm: trinity-c1 Not tainted 4.8.0-rc6-00314-g0331ab6 #1 [ 40.704315] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 40.705711] task: ffff8800111d13c0 task.stack: ffff88001124c000 [ 40.706649] RIP: 0010:[<ffffffff811641c4>] [<ffffffff811641c4>] validate_mm_rb+0x32/0x4b [ 40.707956] RSP: 0018:ffff88001124fcf0 EFLAGS: 00010282 [ 40.708795] RAX: 0000000000000145 RBX: ffff8800112e0910 RCX: 0000000000000000 [ 40.709919] RDX: ffffffff82445980 RSI: ffffffff8243d1e8 RDI: ffffffff8243d1e8 [ 40.710742] RBP: ffff88001124fd08 R08: 0000000000000001 R09: 0000000000000000 [ 40.711488] R10: 0000000000000000 R11: 0000000000000005 R12: ffff8800112e08f0 [ 40.712232] R13: ffff880011258bb0 R14: ffff88001117eac0 R15: ffff88001117eac8 [ 40.712968] FS: 0000000000000000(0000) GS:ffffffff82424000(0063) knlGS:0000000008d7c840 [ 40.713808] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 40.714423] CR2: 0000000008d7c8a8 CR3: 0000000011246000 CR4: 00000000001406b0 [ 40.715165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 40.715905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000050602 [ 40.716646] Stack: [ 40.716867] ffff880011258bb0 ffff8800112e08f0 ffff880011258bb0 ffff88001124fd50 [ 40.717705] ffffffff811643cd ffff880011b3a8b8 ffff880011258bb0 ffff880011258bb0 [ 40.718598] ffff8800112e08f0 ffff880011258bb0 ffff88001117eac0 0000000000000003 [ 40.719432] Call Trace: [ 40.719698] [<ffffffff811643cd>] vma_rb_erase+0x22/0x1cd [ 40.720270] [<ffffffff81164ac1>] __vma_adjust+0x3d3/0x697 [ 40.720846] [<ffffffff810e84d4>] ? mark_held_locks+0x50/0x6e [ 40.721452] [<ffffffff811650ff>] vma_merge+0x22c/0x27d [ 40.721998] [<ffffffff81167e17>] mprotect_fixup+0x10b/0x23c [ 40.722606] [<ffffffff811680bc>] SyS_mprotect+0x174/0x205 [ 40.723183] [<ffffffff810017e6>] do_fast_syscall_32+0x159/0x2aa [ 40.723815] [<ffffffff81db29a0>] entry_SYSENTER_compat+0x50/0x5f [ 40.724455] Code: 89 f5 41 54 53 e8 5d 86 35 00 eb 29 4c 8d 63 e0 4d 39 ec 74 18 4c 89 e7 e8 4e fa ff ff 48 39 43 18 74 0a 4c 89 e7 e8 02 58 ff ff <0f> 0b 48 89 df e8 6e 86 35 00 48 85 c0 48 89 c3 75 cf 5b 41 5c [ 40.727469] RIP [<ffffffff811641c4>] validate_mm_rb+0x32/0x4b [ 40.728097] RSP <ffff88001124fcf0> [ 40.776529] ---[ end trace e91f627109713d4e ]--- [ 40.777062] Kernel panic - not syncing: Fatal exception -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>