On Thu, Jun 02, 2016 at 10:52:28AM -0400, Dave Anderson wrote: > > ----- Original Message ----- > > Dave, > > > > When I ran "bt" against a process running in a user mode, I got > > an odd backtrace result: > > ===8<=== > > crash> ps > > ... > > > 1324 1223 2 ffff80002018be80 RU 0.0 960 468 dhry > > 1325 2 1 ffff800021089900 IN 0.0 0 0 > > [kworker/u16:0] > > crash> bt 1324 > > PID: 1324 TASK: ffff80002018be80 CPU: 2 COMMAND: "dhry" > > ffff800022f6ae08: ffff00000812ae44 (crash_save_cpu on IRQ stack) > > #0 [ffff800022f6ae10] crash_save_cpu at ffff00000812ae44 > > #1 [ffff800022f6ae60] handle_IPI at ffff00000808e718 > > #2 [ffff800022f6b020] gic_handle_irq at ffff0000080815f8 > > #3 [ffff800022f6b050] el0_irq_naked at ffff000008084c4c > > pt_regs: ffff800022f6af60 > > PC: ffffffffffffffff [unknown or invalid address] > > LR: ffff800020107ed0 [unknown or invalid address] > > SP: 0000000000000000 PSTATE: 004016a4 > > X29: ffff000008084c4c X28: ffff800022f6b080 X27: ffff000008e60c54 > > X26: ffff800020107ed0 X25: 0000000000001fff X24: 0000000000000003 > > X23: ffff0000080815f8 X22: ffff800022f6b040 X21: 0000000000000000 > > X20: ffff000008bce000 X19: ffff00000808e758 X18: ffff800022f6b010 > > X17: ffff00000808a820 X16: ffff800022f6aff0 X15: 0000000000000000 > > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000402138 > > X11: ffff000008675850 X10: ffff800022f6afe0 X9: 0000000000000000 > > X8: ffff800022f6afc0 X7: 0000000000000000 X6: 0000000000000000 > > X5: 0000000000000000 X4: 0000000000000001 X3: 0000000000000000 > > X2: 0000000000493000 X1: 0000000000498000 X0: ffffffffffffffff > > ORIG_X0: 0000000020000000 SYSCALLNO: 4021f0 > > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff800020107ed0 fp: > > 0 (?) > > pt_regs: ffff800020107ed0 > > PC: 00000000004016a4 LR: 00000000004016a4 SP: 0000ffffc10c40a0 > > X29: 0000ffffc10c40a0 X28: 0000000000000000 X27: 0000000000000000 > > X26: 0000000000000000 X25: 0000000000402138 X24: 00000000004021f0 > > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004001a0 > > X20: 0000000000000000 X19: 0000000000000000 X18: 0000000000000000 > > X17: 0000000000000001 X16: 0000000000000000 X15: 0000000000493000 > > X14: 0000000000498000 X13: ffffffffffffffff X12: 0000000000000005 > > X11: 000000000000001e X10: 0101010101010101 X9: fffffffff59a9190 > > X8: 7f7f7f7f7f7f7f7f X7: 1f535226301f2b4c X6: 00000003001d1000 > > X5: 00101d0003000000 X4: 0000000000000000 X3: 4952545320454d4f > > X2: 0000000010c35b40 X1: 0000000000000011 X0: 0000000010c35b40 > > ORIG_X0: 0000000000498700 SYSCALLNO: ffffffffffffffff PSTATE: 20000000 > > ===>8=== > > > > * PC, LR and SP look wrong. > > I don't know how those pt_regs values were derived. > > * The message, "WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: > > ffff800020107ed0 fp: 0 (?)" should be refined. > > Apparently, in this case, the process is running in a user mode, > > and so there is no normal kernel stack. > > Support for IRQ stacks was only recently put in place in crash-7.1.5, > and obviously backtraces for a crash-while-in-user-space task is not working > correctly. Unfortunately the only test kdump I have on hand only has IRQ > stack transitions from kernel space. I tried to create a kdump from a system > running user-space commands on our 4.5.0-based kernel, but as luck would > have it, kdump fails to work. (it never even reaches the secondary kernel > for some reason, even though the kdump facility says it's functional) > > Obviously there's a problem in arm64_unwind_frame() trying to make the transition, > and it returns FALSE because of the NULL fp and therefore INSTACK(frame->fp, bt)) > fails. The function is trying to emulate the kernel's unwind_frame() function, > which also would return -EINVAL because of the fp. But I'm not sure whether that > fp value has been set correctly because of the first, seemingly bogus, exception > frame that it's showing. > > As you have seen, kernel space exceptions look like this, where the fp, sp and pc > values are legitimate, so it prints "-- <IRQ stack> --", and transitions to the > exception frame on the process stack: > > crash> set debug 1 > debug: 1 > crash> bt > PID: 0 TASK: fffffe035b0aae00 CPU: 3 COMMAND: "swapper/3" > fffffe03fe183d58: fffffe0000137ee4 (crash_save_cpu on IRQ stack) > #0 [fffffe03fe183d60] crash_save_cpu at fffffe0000137ee4 > #1 [fffffe03fe183dc0] handle_IPI at fffffe000008e8d4 > #2 [fffffe03fe183f80] gic_handle_irq at fffffe00000824c8 > #3 [fffffe03fe183fd0] el1_irq at fffffe0000083520 > bt: arm64_unwind_frame: switch stacks: fp: fffffe035b0f3f30 sp: fffffe035b0f3e10 pc: fffffe000008611c > --- <IRQ stack> --- > pt_regs: fffffe035b0f3e10 > PC: fffffe000008611c [arch_cpu_idle+60] > LR: fffffe0000086118 [arch_cpu_idle+56] > SP: fffffe035b0f3f30 PSTATE: 60000145 > X29: fffffe035b0f3f30 X28: 0000000000000000 X27: fffffe0000084170 > X26: fffffe0000bf13dc X25: fffffe0000cf4000 X24: fffffe035b0f0000 > X23: 0000000000000001 X22: fffffe0000b94c48 X21: 0000000000000003 > X20: fffffe0000cf6000 X19: fffffe0000cf6028 X18: 000002aabb090050 > X17: 000003ff9131a228 X16: fffffe000026dba4 X15: 00000000000000bf > X14: 004894597490a924 X13: 0000000000000000 X12: 0000000000000010 > X11: 0000000000000067 X10: 0000000000000ab0 X9: fffffe035b0f0000 > X8: fffffe035b0ab910 X7: 0000000000007b17 X6: 000000000001c690 > X5: 0000001515d0302c X4: 0100000000000000 X3: fffffe03fe184c8c > X2: fffffe03fe184c80 X1: 0000000000000000 X0: fffffe035b0f0000 > ORIG_X0: fffffe035b0f0000 SYSCALLNO: fffffe0000b94c48 > #4 [fffffe035b0f3e10] arch_cpu_idle at fffffe000008611c > #5 [fffffe035b0f3f40] default_idle_call at fffffe00000f81cc > #6 [fffffe035b0f3f70] cpu_startup_entry at fffffe00000f8320 > #7 [fffffe035b0f3f80] secondary_start_kernel at fffffe000008e338 > crash> > > In your sample, it certainly doesn't appear that the first exception frame found > on the IRQ stack is legitimate, and probably should not pass the test in > arm64_is_kernel_exception_frame(), but it does: > > > crash> bt 1324 > > PID: 1324 TASK: ffff80002018be80 CPU: 2 COMMAND: "dhry" > > ffff800022f6ae08: ffff00000812ae44 (crash_save_cpu on IRQ stack) > > #0 [ffff800022f6ae10] crash_save_cpu at ffff00000812ae44 > > #1 [ffff800022f6ae60] handle_IPI at ffff00000808e718 > > #2 [ffff800022f6b020] gic_handle_irq at ffff0000080815f8 > > #3 [ffff800022f6b050] el0_irq_naked at ffff000008084c4c > > pt_regs: ffff800022f6af60 > > PC: ffffffffffffffff [unknown or invalid address] > > LR: ffff800020107ed0 [unknown or invalid address] > > SP: 0000000000000000 PSTATE: 004016a4 > > X29: ffff000008084c4c X28: ffff800022f6b080 X27: ffff000008e60c54 > > X26: ffff800020107ed0 X25: 0000000000001fff X24: 0000000000000003 > > X23: ffff0000080815f8 X22: ffff800022f6b040 X21: 0000000000000000 > > X20: ffff000008bce000 X19: ffff00000808e758 X18: ffff800022f6b010 > > X17: ffff00000808a820 X16: ffff800022f6aff0 X15: 0000000000000000 > > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000402138 > > X11: ffff000008675850 X10: ffff800022f6afe0 X9: 0000000000000000 > > X8: ffff800022f6afc0 X7: 0000000000000000 X6: 0000000000000000 > > X5: 0000000000000000 X4: 0000000000000001 X3: 0000000000000000 > > X2: 0000000000493000 X1: 0000000000498000 X0: ffffffffffffffff > > ORIG_X0: 0000000020000000 SYSCALLNO: 4021f0 > > Maybe that is the cause of the bogus "fp"? Anyway, since the orig_sp is > from a fixed location at the top of the IRQ stack, It then manages to make its > way back to the "dhry" process stack, where this exception frame "looks" legitimate: > > > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff800020107ed0 fp: 0 (?) > > pt_regs: ffff800020107ed0 > > PC: 00000000004016a4 LR: 00000000004016a4 SP: 0000ffffc10c40a0 > > X29: 0000ffffc10c40a0 X28: 0000000000000000 X27: 0000000000000000 > > X26: 0000000000000000 X25: 0000000000402138 X24: 00000000004021f0 > > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004001a0 > > X20: 0000000000000000 X19: 0000000000000000 X18: 0000000000000000 > > X17: 0000000000000001 X16: 0000000000000000 X15: 0000000000493000 > > X14: 0000000000498000 X13: ffffffffffffffff X12: 0000000000000005 > > X11: 000000000000001e X10: 0101010101010101 X9: fffffffff59a9190 > > X8: 7f7f7f7f7f7f7f7f X7: 1f535226301f2b4c X6: 00000003001d1000 > > X5: 00101d0003000000 X4: 0000000000000000 X3: 4952545320454d4f > > X2: 0000000010c35b40 X1: 0000000000000011 X0: 0000000010c35b40 > > ORIG_X0: 0000000000498700 SYSCALLNO: ffffffffffffffff PSTATE: 20000000 > > But I'm not sure what happens when an arm64 IRQ exception occurs when > the task is running in user space. Does it lay an exception frame down on the > process stack and then make the transition? (and therefore the user-space frame > above is legitimate?) Or does the user-space frame get laid down directly on the > IRQ stack? Unfortunately I don't know enough about arm64 exception handling. Since I reviewed this IRQ stack patch in LAK-ML, I will be able to help you. but I don't have enough time to explain in details this week. > In any case, the bt should display "-- <IRQ stack> ...", and them dump > the user-to-kernel-space exception frame, wherever it lies, i.e., either on the > normal process stack or (maybe?) on the IRQ stack. > > Anyway, can you make the vmlinux/vmcore pair available for me to download? You can > send the details to me offline. I sent you a message which contains the link to those binaries. Thanks, -Takahiro AKASHI > Thanks, > Dave > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility -- Thanks, -Takahiro AKASHI -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility