On 06/25/2014 06:36 PM, Hugh Dickins wrote: > Sasha, may I trespass on your time, and ask you to revert the previous > patch from your tree, and give this patch below a try? I am very > interested to learn if in fact it fixes it for you (as it did for me). Hi Hugh, Happy to help, and as I often do I will answer with a question. I've observed two different issues after reverting the original fix and applying this new patch. Both of them seem semi-related, but I'm not sure. First, this: [ 681.267487] BUG: unable to handle kernel paging request at ffffea0003480048 [ 681.268621] IP: zap_pte_range (mm/memory.c:1132) [ 681.269335] PGD 37fcc067 PUD 37fcb067 PMD 0 [ 681.269972] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 681.270952] Dumping ftrace buffer: [ 681.270952] (ftrace buffer empty) [ 681.270952] Modules linked in: [ 681.270952] CPU: 7 PID: 1952 Comm: trinity-c29 Not tainted 3.16.0-rc2-next-20140625-s asha-00025-g2e02e05-dirty #730 [ 681.270952] task: ffff8803e6f58000 ti: ffff8803df050000 task.ti: ffff8803df050000 [ 681.270952] RIP: zap_pte_range (mm/memory.c:1132) [ 681.270952] RSP: 0018:ffff8803df053c58 EFLAGS: 00010246 [ 681.270952] RAX: ffffea0003480040 RBX: ffff8803edae7a70 RCX: 0000000003480040 [ 681.270952] RDX: 00000000d2001730 RSI: 0000000000000000 RDI: 00000000d2001730 [ 681.270952] RBP: ffff8803df053cf8 R08: ffff88000015cc00 R09: 0000000000000000 [ 681.270952] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea0003480040 [ 681.270952] R13: ffff8803df053de8 R14: 00007fc15014f000 R15: 00007fc15014e000 [ 681.270952] FS: 00007fc15031b700(0000) GS:ffff8801ece00000(0000) knlGS:0000000000000 000 [ 681.270952] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 681.270952] CR2: ffffea0003480048 CR3: 000000001a02e000 CR4: 00000000000006a0 [ 681.270952] Stack: [ 681.270952] ffff8803df053de8 00000000d2001000 00000000d2001fff ffff8803e6f58000 [ 681.270952] 0000000000000000 0000000000000001 ffff880404dd8400 ffff8803e6e31900 [ 681.270952] 00000000d2001730 ffff88000015cc00 0000000000000000 ffff8804078f8000 [ 681.270952] Call Trace: [ 681.270952] unmap_single_vma (mm/memory.c:1256 mm/memory.c:1277 mm/memory.c:1301 mm/m emory.c:1346) [ 681.270952] unmap_vmas (mm/memory.c:1375 (discriminator 1)) [ 681.270952] exit_mmap (mm/mmap.c:2797) [ 681.270952] ? preempt_count_sub (kernel/sched/core.c:2606) [ 681.270952] mmput (kernel/fork.c:638) [ 681.270952] do_exit (kernel/exit.c:744) [ 681.270952] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 681.270952] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/ lockdep.c:2599) [ 681.270952] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) [ 681.270952] do_group_exit (kernel/exit.c:884) [ 681.270952] SyS_exit_group (kernel/exit.c:895) [ 681.270952] tracesys (arch/x86/kernel/entry_64.S:542) [ 681.270952] Code: e8 cf 39 25 03 49 8b 4c 24 10 48 39 c8 74 1c 48 8b 7d b8 48 c1 e1 0c 48 89 da 48 83 c9 40 4c 89 fe e8 e5 db ff ff 0f 1f 44 00 00 <41> f6 44 24 08 01 74 08 83 6d c8 01 eb 33 66 90 f6 45 a0 40 74 All code ======== 0: e8 cf 39 25 03 callq 0x32539d4 5: 49 8b 4c 24 10 mov 0x10(%r12),%rcx a: 48 39 c8 cmp %rcx,%rax d: 74 1c je 0x2b f: 48 8b 7d b8 mov -0x48(%rbp),%rdi 13: 48 c1 e1 0c shl $0xc,%rcx 17: 48 89 da mov %rbx,%rdx 1a: 48 83 c9 40 or $0x40,%rcx 1e: 4c 89 fe mov %r15,%rsi 21: e8 e5 db ff ff callq 0xffffffffffffdc0b 26: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 2b:* 41 f6 44 24 08 01 testb $0x1,0x8(%r12) <-- trapping instruction 31: 74 08 je 0x3b 33: 83 6d c8 01 subl $0x1,-0x38(%rbp) 37: eb 33 jmp 0x6c 39: 66 90 xchg %ax,%ax 3b: f6 45 a0 40 testb $0x40,-0x60(%rbp) 3f: 74 00 je 0x41 Code starting with the faulting instruction =========================================== 0: 41 f6 44 24 08 01 testb $0x1,0x8(%r12) 6: 74 08 je 0x10 8: 83 6d c8 01 subl $0x1,-0x38(%rbp) c: eb 33 jmp 0x41 e: 66 90 xchg %ax,%ax 10: f6 45 a0 40 testb $0x40,-0x60(%rbp) 14: 74 00 je 0x16 [ 681.270952] RIP zap_pte_range (mm/memory.c:1132) [ 681.270952] RSP <ffff8803df053c58> [ 681.270952] CR2: ffffea0003480048 And a longer lockup that shows a few shmem_fallocate hanging, but they don't seem to be the main reason for the hang (log it pretty long, attached). Thanks, Sasha
Attachment:
out.txt.gz
Description: application/gzip