On 06/27/2014 02:03 PM, Hugh Dickins wrote: > On Fri, 27 Jun 2014, Sasha Levin wrote: >> On 06/27/2014 01:59 AM, Hugh Dickins wrote: >>>>> First, this: >>>>> >>>>> [ 681.267487] BUG: unable to handle kernel paging request at ffffea0003480048 >>>>> [ 681.268621] IP: zap_pte_range (mm/memory.c:1132) >>> Weird, I don't think we've seen anything like that before, have we? >>> I'm pretty sure it's not a consequence of my "index = min(index, end)", >>> but what it portends I don't know. Please confirm mm/memory.c:1132 - >>> that's the "if (PageAnon(page))" line, isn't it? Which indeed matches >>> the code below. So accessing page->mapping is causing an oops... >> >> Right, that's the correct line. >> >> At this point I'm pretty sure that it's somehow related to that one line >> patch since it reproduced fairly quickly after applying it, and when I >> removed it I didn't see it happening again during the overnight fuzzing. > > Oh, I assumed it was a one-off: you're saying that you saw it more than > once with the min(index, end) patch in? But not since removing it (did > you replace that by the newer patch? or by the older? or by nothing?). It reproduced exactly twice, can't say it happens too often. What I did was revert your original fix for the issue and apply the one-liner. I've spent most of yesterday chasing a different bug with a "clean" -next tree (without the revert and the one-line patch) and didn't see any mm/ issues. However, about 2 hours after doing the revert and applying the one-line patch I've encountered the following: [ 3686.797859] BUG: unable to handle kernel paging request at ffff88028a488f98 [ 3686.805732] IP: do_read_fault.isra.40 (mm/memory.c:2856 mm/memory.c:2889) [ 3686.805732] PGD 12b82067 PUD 704d49067 PMD 704cf6067 PTE 800000028a488060 [ 3686.805732] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 3686.815852] Dumping ftrace buffer: [ 3686.815852] (ftrace buffer empty) [ 3686.815852] Modules linked in: [ 3686.815852] CPU: 10 PID: 8890 Comm: modprobe Not tainted 3.16.0-rc2-next-20140627-sasha-00024-ga284b83-dirty #753 [ 3686.815852] task: ffff8801d1c20000 ti: ffff8801c6a08000 task.ti: ffff8801c6a08000 [ 3686.826134] RIP: do_read_fault.isra.40 (mm/memory.c:2856 mm/memory.c:2889) [ 3686.826134] RSP: 0000:ffff8801c6a0bc78 EFLAGS: 00010297 [ 3686.826134] RAX: 0000000000000000 RBX: ffff880288531200 RCX: 000000000000001f [ 3686.826134] RDX: 0000000000000014 RSI: 00007f22949f3000 RDI: ffff88028a488f98 [ 3686.826134] RBP: ffff8801c6a0bd18 R08: 00007f2294a13000 R09: 000000000000000c [ 3686.826134] R10: 0000000000000000 R11: 00000000000000a8 R12: 00007f2294a07c50 [ 3686.826134] R13: ffff880279fec4b0 R14: 00007f22949f3000 R15: ffff88028ebbc528 [ 3686.826134] FS: 0000000000000000(0000) GS:ffff880292e00000(0000) knlGS:0000000000000000 [ 3686.826134] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3686.826134] CR2: ffff88028a488f98 CR3: 000000026b766000 CR4: 00000000000006a0 [ 3686.826134] Stack: [ 3686.826134] ffff8801c6a0bc98 0000000000000001 ffff8802000000a8 0000000000000014 [ 3686.826134] ffff88028a489038 0000000000000000 ffff88026d40d000 ffff88028eafaee0 [ 3686.826134] ffff8801c6a0bcd8 ffffffff8e572715 ffffea000a292240 000000000028a489 [ 3686.826134] Call Trace: [ 3686.826134] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:98 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183) [ 3686.826134] ? __pte_alloc (mm/memory.c:598 mm/memory.c:593) [ 3686.826134] __handle_mm_fault (mm/memory.c:3037 mm/memory.c:3198 mm/memory.c:3322) [ 3686.826134] handle_mm_fault (include/linux/memcontrol.h:124 mm/memory.c:3348) [ 3686.826134] ? __do_page_fault (arch/x86/mm/fault.c:1163) [ 3686.826134] __do_page_fault (arch/x86/mm/fault.c:1230) [ 3686.826134] ? vtime_account_user (kernel/sched/cputime.c:687) [ 3686.826134] ? get_parent_ip (kernel/sched/core.c:2550) [ 3686.826134] ? context_tracking_user_exit (include/linux/vtime.h:89 include/linux/jump_label.h:115 include/trace/events/context_tracking.h:47 kernel/context_tracking.c:180) [ 3686.826134] ? preempt_count_sub (kernel/sched/core.c:2606) [ 3686.826134] ? context_tracking_user_exit (kernel/context_tracking.c:184) [ 3686.826134] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 3686.826134] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2)) [ 3686.826134] trace_do_page_fault (arch/x86/mm/fault.c:1313 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1314) [ 3686.826134] do_async_page_fault (arch/x86/kernel/kvm.c:264) [ 3686.826134] async_page_fault (arch/x86/kernel/entry_64.S:1322) [ 3686.826134] Code: 89 c0 4c 8b 43 08 48 8d 4c 08 ff 49 01 c1 49 39 c9 4c 0f 47 c9 4c 89 c1 4c 29 f1 48 c1 e9 0c 49 8d 4c 0a ff 49 39 c9 4c 0f 47 c9 <48> 83 3f 00 74 3c 48 83 c0 01 4c 39 c8 77 74 48 81 c6 00 10 00 All code ======== 0: 89 c0 mov %eax,%eax 2: 4c 8b 43 08 mov 0x8(%rbx),%r8 6: 48 8d 4c 08 ff lea -0x1(%rax,%rcx,1),%rcx b: 49 01 c1 add %rax,%r9 e: 49 39 c9 cmp %rcx,%r9 11: 4c 0f 47 c9 cmova %rcx,%r9 15: 4c 89 c1 mov %r8,%rcx 18: 4c 29 f1 sub %r14,%rcx 1b: 48 c1 e9 0c shr $0xc,%rcx 1f: 49 8d 4c 0a ff lea -0x1(%r10,%rcx,1),%rcx 24: 49 39 c9 cmp %rcx,%r9 27: 4c 0f 47 c9 cmova %rcx,%r9 2b:* 48 83 3f 00 cmpq $0x0,(%rdi) <-- trapping instruction 2f: 74 3c je 0x6d 31: 48 83 c0 01 add $0x1,%rax 35: 4c 39 c8 cmp %r9,%rax 38: 77 74 ja 0xae 3a: 48 81 c6 00 10 00 00 add $0x1000,%rsi Code starting with the faulting instruction =========================================== 0: 48 83 3f 00 cmpq $0x0,(%rdi) 4: 74 3c je 0x42 6: 48 83 c0 01 add $0x1,%rax a: 4c 39 c8 cmp %r9,%rax d: 77 74 ja 0x83 f: 48 81 c6 00 10 00 00 add $0x1000,%rsi [ 3686.826134] RIP do_read_fault.isra.40 (mm/memory.c:2856 mm/memory.c:2889) [ 3686.826134] RSP <ffff8801c6a0bc78> [ 3686.826134] CR2: ffff88028a488f98 Association is not causation but this is pretty suspicious... > I want to exclaim "That makes no sense!", but bugs don't make sense > anyway. It's going to be a challenge to work out a connection though. > I think I want to ask for more attempts to reproduce, with and without > the min(index, end) patch (if you have enough time - there must be a > limit to the amount of time you can give me on this). > > I rather hoped that the oops on PageAnon might shed light from another > direction on the outstanding page_mapped bug: both seem like page table > corruption of some kind (though I've not seen a plausible path to either). > > And regarding the page_mapped bug: we've heard nothing since Dave > Hansen suggested a VM_BUG_ON_PAGE for that - has it gone away now? Seems like it. I'm carrying Dave's patch still, but haven't seen it triggering. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>