Hi Davidlohr, On Mon, Mar 10, 2014 at 02:08:08PM -0700, Davidlohr Bueso wrote: > On Mon, 2014-03-10 at 10:43 +0800, Fengguang Wu wrote: > > Hi Davidlohr, > > > > I got the below dmesg and the first bad commit is > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > commit 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3 > > Author: Davidlohr Bueso <davidlohr@xxxxxx> > > AuthorDate: Thu Mar 6 11:01:48 2014 +1100 > > Commit: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> > > CommitDate: Thu Mar 6 11:01:48 2014 +1100 > > > > mm: per-thread vma caching > > > > hwclock: can't open '/dev/misc/rtc': No such file or directory > > Running postinst /etc/rpm-postinsts/100... > > Hmm this kind of errors strike me as dealing with some bogus vma from a > stale cache. > > > [ 3.658976] BUG: unable to handle kernel paging request at ffff880008142f40 > > [ 3.661422] IP: [<ffffffff8111a1d8>] vmacache_find+0x78/0x90 > > [ 3.662223] PGD 2542067 PUD 2543067 PMD fba5067 PTE 8000000008142060 > > [ 3.662223] Oops: 0000 [#1] DEBUG_PAGEALLOC > > [ 3.662223] Modules linked in: > > [ 3.662223] CPU: 0 PID: 326 Comm: 90-trinity Not tainted 3.14.0-rc5-next-20140307 #1 > > Have you only seen this through DEBUG_PAGEALLOC + trinity? Yes. > > [ 3.662223] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [ 3.662223] task: ffff8800083020d0 ti: ffff8800082a0000 task.ti: ffff8800082a0000 > > [ 3.662223] RIP: 0010:[<ffffffff8111a1d8>] [<ffffffff8111a1d8>] vmacache_find+0x78/0x90 > > [ 3.662223] RSP: 0000:ffff8800082a1e00 EFLAGS: 00010282 > > [ 3.662223] RAX: ffff880008142f40 RBX: 00000000000000a9 RCX: ffff8800083020d0 > > [ 3.662223] RDX: 0000000000000002 RSI: 00007fff8a141698 RDI: ffff880008124bc0 > > [ 3.662223] RBP: ffff8800082a1e00 R08: 0000000000000000 R09: 0000000000000001 > > [ 3.662223] R10: ffff8800083020d0 R11: 0000000000000000 R12: 00007fff8a141698 > > [ 3.662223] R13: ffff880008124bc0 R14: ffff8800082a1f58 R15: ffff8800083020d0 > > [ 3.662223] FS: 00007fe3ca364700(0000) GS:ffffffff81a06000(0000) knlGS:0000000000000000 > > [ 3.662223] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3.662223] CR2: ffff880008142f40 CR3: 000000000824c000 CR4: 00000000000006b0 > > [ 3.662223] Stack: > > [ 3.662223] ffff8800082a1e28 ffffffff81125219 00000000000000a9 00007fff8a141698 > > [ 3.662223] ffff880008124bc0 ffff8800082a1f28 ffffffff816d71fe 0000000000000246 > > [ 3.662223] 0000000000000002 ffff880008124c58 0000000000000006 0000000000000010 > > [ 3.662223] Call Trace: > > [ 3.662223] [<ffffffff81125219>] find_vma+0x19/0x70 > > [ 3.662223] [<ffffffff816d71fe>] __do_page_fault+0x29e/0x560 > > [ 3.662223] [<ffffffff8116cc6f>] ? mntput_no_expire+0x6f/0x1a0 > > [ 3.662223] [<ffffffff8116cc11>] ? mntput_no_expire+0x11/0x1a0 > > [ 3.662223] [<ffffffff8116cdd5>] ? mntput+0x35/0x40 > > [ 3.662223] [<ffffffff8114f51f>] ? __fput+0x24f/0x290 > > [ 3.662223] [<ffffffff812794ca>] ? trace_hardirqs_off_thunk+0x3a/0x3c > > [ 3.662223] [<ffffffff816d74ce>] do_page_fault+0xe/0x10 > > [ 3.662223] [<ffffffff816d6ad5>] do_async_page_fault+0x35/0x90 > > [ 3.662223] [<ffffffff816d3b05>] async_page_fault+0x25/0x30 > > [ 3.662223] Code: c7 81 b0 02 00 00 00 00 00 00 eb 32 0f 1f 80 00 00 00 00 31 d2 66 0f 1f 44 00 00 48 63 c2 48 8b 84 c1 98 02 00 00 48 85 c0 74 0b <48> 39 30 77 06 48 3b 70 08 72 0a 83 c2 01 83 fa 04 75 dd 31 c0 > > [ 3.662223] RIP [<ffffffff8111a1d8>] vmacache_find+0x78/0x90 > > So this is: > 0: c7 81 b0 02 00 00 00 movl $0x0,0x2b0(%rcx) > 7: 00 00 00 > a: eb 32 jmp 0x3e > c: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > 13: 31 d2 xor %edx,%edx > 15: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > 1b: 48 63 c2 movslq %edx,%rax > 1e: 48 8b 84 c1 98 02 00 mov 0x298(%rcx,%rax,8),%rax > 25: 00 > 26: 48 85 c0 test %rax,%rax > 29: 74 0b je 0x36 > 2b:* 48 39 30 cmp %rsi,(%rax) <-- trapping instruction > > > which seems to be the following, where vma is stale: > if (vma && vma->vm_start <= addr && vma->vm_end > addr) > return vma; > > > 2e: 77 06 ja 0x36 > 30: 48 3b 70 08 cmp 0x8(%rax),%rsi > 34: 72 0a jb 0x40 > 36: 83 c2 01 add $0x1,%edx > 39: 83 fa 04 cmp $0x4,%edx > 3c: 75 dd jne 0x1b > 3e: 31 c0 xor %eax,%eax > > Could you please try this fix: https://lkml.org/lkml/2014/3/10/505 - it > is fix for a race Oleg found that can cause us to keep bogus vmas in the > cache under certain VM_CLONE scenarios. With the patch applied on top of 0d9ad4220e6d73f63a9eeeaac031b92838f75bb3 I still get this: sed: /lib/modules/3.14.0-rc5-00226-g0d9ad422-dirty/modules.dep: No such file or directory xargs: modprobe: No such file or directory run-parts: /etc/kernel-tests/01-modprobe exited with code 127 [ 3.807108] BUG: unable to handle kernel paging request at ffff8800060d4f40 [ 3.807408] IP: [<ffffffff81111438>] vmacache_find+0x78/0x90 [ 3.807408] PGD 23f9067 PUD 23fa067 PMD fbb5067 PTE 80000000060d4060 [ 3.807408] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC [ 3.807408] CPU: 0 PID: 428 Comm: 90-trinity Not tainted 3.14.0-rc5-00226-g0d9ad422-dirty #5 [ 3.807408] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3.807408] task: ffff88000ca1e050 ti: ffff8800060ce000 task.ti: ffff8800060ce000 [ 3.807408] RIP: 0010:[<ffffffff81111438>] [<ffffffff81111438>] vmacache_find+0x78/0x90 [ 3.807408] RSP: 0000:ffff8800060cfde8 EFLAGS: 00010282 [ 3.807408] RAX: ffff8800060d4f40 RBX: 00007f2c1d835190 RCX: ffff88000ca1e050 [ 3.807408] RDX: 0000000000000002 RSI: 00007f2c1d835190 RDI: ffff880006721b80 [ 3.807408] RBP: ffff8800060cfde8 R08: 0000000000000001 R09: 0000000000000000 [ 3.807408] R10: 0000000000000000 R11: 0000000000000156 R12: ffff880006721b80 [ 3.807408] R13: 00000000ffffffff R14: ffff880006721c18 R15: 00007f2c1d835190 [ 3.807408] FS: 00007f2c1dd87700(0000) GS:ffffffff8187d000(0000) knlGS:0000000000000000 [ 3.807408] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.807408] CR2: ffff8800060d4f40 CR3: 000000000829d000 CR4: 00000000000006f0 [ 3.807408] Stack: [ 3.807408] ffff8800060cfe18 ffffffff8111d08b ffffffff810378b5 0000000000000246 [ 3.807408] ffff880006721b80 00000000000000a8 ffff8800060cff28 ffffffff81037926 [ 3.807408] 0000000000000000 ffff88000ca1e050 ffff8800060cff58 0000000000000014 > Also, how frequently are you able to trigger this? This one is very reproducible: +--------------------------------------------+---+ | boot_successes | 2 | | boot_failures | 7 | +--------------------------------------------+---+ | BUG:unable_to_handle_kernel_paging_request | 7 | | Oops:PREEMPT_DEBUG_PAGEALLOC | 7 | | RIP:vmacache_find | 7 | | Kernel_panic-not_syncing:Fatal_exception | 7 | +--------------------------------------------+---+ [ 1.641026] BUG: unable to handle kernel paging request at ffff88000926af38 [ 1.642202] IP: [<ffffffff8dce4834>] vmacache_find+0x59/0x69 [ 1.643172] PGD eee4067 PUD eee5067 PMD fb9c067 PTE 800000000926a060 [ 1.644024] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 1.644024] Modules linked in: [ 1.644024] CPU: 1 PID: 190 Comm: 90-trinity Not tainted 3.14.0-rc5-next-20140306-06952-g0ffb2fe #3 [ 1.644024] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1.644024] task: ffff8800093ba010 ti: ffff880009340000 task.ti: ffff880009340000 [ 1.644024] RIP: 0010:[<ffffffff8dce4834>] [<ffffffff8dce4834>] vmacache_find+0x59/0x69 [ 1.644024] RSP: 0000:ffff880009341dc8 EFLAGS: 00010282 [ 1.644024] RAX: ffff8800093ba010 RBX: 00007fffd108de78 RCX: ffff88000926af38 [ 1.644024] RDX: 0000000000000002 RSI: 00007fffd108de78 RDI: ffff8800091cdb80 [ 1.644024] RBP: ffff880009341dc8 R08: 0000000000000000 R09: 0000000000000000 [ 1.644024] R10: 00007fffd108e200 R11: 00007fa9ab552190 R12: 00007fffd108de78 [ 1.644024] R13: ffff8800091cdb80 R14: 00000000000000a9 R15: ffff8800093ba010 [ 1.644024] FS: 00007fa9abaa4700(0000) GS:ffff88000f900000(0000) knlGS:0000000000000000 [ 1.644024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.644024] CR2: ffff88000926af38 CR3: 000000000931a000 CR4: 00000000000006a0 [ 1.644024] Stack: [ 1.644024] ffff880009341df0 ffffffff8dcec5d5 ffff880009341f58 00007fffd108de78 [ 1.644024] 0000000000000006 ffff880009341ef8 ffffffff8e11c375 ffff88000c285c70 [ 1.644024] ffff880000000000 0000000000000002 ffff8800091cdc18 ffff8800091cdb80 [ 1.644024] Call Trace: [ 1.644024] [<ffffffff8dcec5d5>] find_vma+0x14/0x5b [ 1.644024] [<ffffffff8e11c375>] __do_page_fault+0x211/0x399 [ 1.644024] [<ffffffff8dc97214>] ? lock_release+0x150/0x16a [ 1.644024] [<ffffffff8dd1bd6c>] ? rcu_read_unlock+0x1c/0x26 [ 1.644024] [<ffffffff8dca347a>] ? rcu_eqs_exit_common.isra.63+0x18/0xb3 [ 1.644024] [<ffffffff8dca3591>] ? rcu_eqs_exit+0x7c/0x83 [ 1.644024] [<ffffffff8e11c52f>] do_page_fault+0x32/0x4b [ 1.644024] [<ffffffff8e11bd2c>] do_async_page_fault+0x2d/0x89 [ 1.644024] [<ffffffff8e119588>] async_page_fault+0x28/0x30 [ 1.644024] Code: b9 08 00 00 00 31 c0 48 89 d7 f3 ab eb 19 31 d2 48 63 ca 48 8b 8c c8 90 02 00 00 48 85 c9 75 0b ff c2 83 fa 04 75 e9 31 c0 eb 0e <48> 39 31 77 f0 48 39 71 08 76 ea 48 89 c8 5d c3 48 8b 47 08 48 [ 1.644024] RIP [<ffffffff8dce4834>] vmacache_find+0x59/0x69 [ 1.644024] RSP <ffff880009341dc8> [ 1.644024] CR2: ffff88000926af38 [ 1.644024] ---[ end trace 672cd9b20daaa00f ]--- [ 1.644031] BUG: unable to handle kernel paging request at ffff880008c19f38 While this one only shows up once: [ 7.074732] uname (237) used greatest stack depth: 5224 bytes left Kernel tests: Boot OK! [ 7.446110] BUG: Bad rss-counter state mm:ffff88000ae8e400 idx:0 val:-16 [ 7.453003] BUG: Bad rss-counter state mm:ffff88000ae8ea00 idx:0 val:-16 [ 7.458563] BUG: Bad rss-counter state mm:ffff88000ae8c000 idx:0 val:48 [ 7.462573] BUG: Bad rss-counter state mm:ffff88000ae8c600 idx:0 val:-16 [ 7.467516] BUG: unable to handle kernel paging request at ffff88000ae12000 [ 7.470027] IP: [<ffffffff88388fbf>] vmacache_find+0xaf/0x110 [ 7.470027] PGD a0bf067 PUD a0c0067 PMD fb8e067 PTE 800000000ae12060 [ 7.470027] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC [ 7.470027] Modules linked in: [ 7.470027] CPU: 0 PID: 280 Comm: 90-trinity Not tainted 3.14.0-rc5-next-20140306-06952-g0ffb2fe #1 [ 7.470027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 7.470027] task: ffff88000ae094f0 ti: ffff88000ae48000 task.ti: ffff88000ae48000 [ 7.470027] RIP: 0010:[<ffffffff88388fbf>] [<ffffffff88388fbf>] vmacache_find+0xaf/0x110 [ 7.470027] RSP: 0000:ffff88000ae49df8 EFLAGS: 00010206 [ 7.470027] RAX: ffff88000ae12000 RBX: 000000000043b2f2 RCX: 000000000000e8f5 [ 7.470027] RDX: 000000000000e8f8 RSI: 000000000043b2f2 RDI: ffff88000ae094f0 [ 7.470027] RBP: ffff88000ae49df8 R08: 000000000000e8f9 R09: 0000000000000000 [ 7.470027] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88000ae8f600 [ 7.470027] R13: 00000000000000a8 R14: ffff88000ae8f600 R15: ffff88000ae49f58 [ 7.470027] FS: 00007f582c3bd700(0000) GS:ffffffff88ef7000(0000) knlGS:0000000000000000 [ 7.470027] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 7.470027] CR2: ffff88000ae12000 CR3: 000000000ae52000 CR4: 00000000000006f0 [ 7.470027] Stack: [ 7.470027] ffff88000ae49e18 ffffffff88397e22 000000000043b2f2 0000000000000014 [ 7.470027] ffff88000ae49f18 ffffffff8829be64 0000000000000202 ffff88000ae094f0 [ 7.470027] 0000000000000000 ffff88000ae8f6a8 0000000000000028 ffffffffff0a0000 [ 7.470027] Call Trace: [ 7.470027] [<ffffffff88397e22>] find_vma+0x22/0xc0 [ 7.470027] [<ffffffff8829be64>] __do_page_fault+0x374/0x7d0 [ 7.470027] [<ffffffff885afd80>] ? snprintf+0x40/0x50 [ 7.470027] [<ffffffff885bf692>] ? __this_cpu_preempt_check+0x32/0x40 [ 7.470027] [<ffffffff8829c300>] do_page_fault+0x10/0x20 [ 7.470027] [<ffffffff8829805e>] do_async_page_fault+0x3e/0xe0 [ 7.470027] [<ffffffff889e64f8>] async_page_fault+0x28/0x30 [ 7.470027] Code: a7 01 48 98 48 8b 84 c7 c8 02 00 00 48 85 c0 75 13 0f 1f 00 48 ff 05 d9 7c a7 01 4c 39 c2 75 d4 eb 29 66 90 48 ff 05 b9 7c a7 01 <48> 39 30 77 e4 48 ff 05 b5 7c a7 01 48 3b 70 08 73 d7 eb 3d 0f [ 7.470027] RIP [<ffffffff88388fbf>] vmacache_find+0xaf/0x110 [ 7.470027] RSP <ffff88000ae49df8> [ 7.470027] CR2: ffff88000ae12000 [ 7.470027] ---[ end trace 979b97a5e66af41a ]--- [ 7.470027] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:20 > Thanks for the report. You are welcome. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>