Re: [PATCH] ppc64: fix 'bt' command for vmcore captured with fadump.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thursday 19 January 2017 07:54 PM, Dave Anderson wrote:

----- Original Message -----

On Thursday 19 January 2017 02:05 AM, Dave Anderson wrote:
----- Original Message -----
Without this patch, backtraces of active tasks maybe be of the form
"#0 [c0000000700b3a90] (null) at c0000000700b3b50  (unreliable)" for
kernel dumps captured with fadump.  Trying to use ptregs saved for
active tasks before falling back to stack-search method. Also, getting
rid of warnings like "‘is_hugepage’ declared inline after being called".

Signed-off-by: Hari Bathini <hbathini@xxxxxxxxxxxxxxxxxx>
Hari,

I only have 1 sample vmcore generated by FADUMP, and I see that
the backtraces of the non-panicking active tasks are an improvement
given that they show the exception frame register set.  However, I also
note that the panic task backtrace has changed, from this using the
current method:

    PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
     #0 [c000000255933620] .crash_fadump at c00000000002cbb8
     #1 [c0000002559336c0] .die at c000000000030dc8
     #2 [c000000255933770] .bad_page_fault at c000000000043748
     #3 [c0000002559337f0] handle_page_fault at c000000000005228
     Data Access [300] exception frame:
     R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
     R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
     R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
     R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
     R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
     R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
     R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
     R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
     R30: c000000000ec9208    R31: c000000001423aac
     NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
     CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
     CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000
     #4 [c000000255933ae0] .sysrq_handle_crash at c00000000039c57c
     [Link Register] [c000000255933ae0] .__handle_sysrq at c00000000039c8c8
     #5 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
     #6 [c000000255933c30] .proc_reg_write at c000000000244874
     #7 [c000000255933ce0] .vfs_write at c0000000001c9dac
     #8 [c000000255933d80] .sys_write at c0000000001c9fd8
     #9 [c000000255933e30] syscall_exit at c000000000008564
     System Call [c00] exception frame:
     R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
     R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
     R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
     R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
     R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
     R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
     R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
     R30: 00000fffa55a0000    R31: 0000000000000002
     NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
     CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
     CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
     DSISR: 0000000042000000     Syscall Result: 0000000000000000
to this with your patch, where the exception backtrace is missing:

    PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
     R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
     R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
     R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
     R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
     R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
     R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
     R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
     R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
     R30: c000000000ec9208    R31: c000000001423aac
     NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
     CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
     CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000
     NIP [c00000000039c57c] .sysrq_handle_crash
     LR  [c00000000039c8c8] .__handle_sysrq
     #0 [c000000255933ae0] .__handle_sysrq at c00000000039c89c
     #1 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
     #2 [c000000255933c30] .proc_reg_write at c000000000244874
     #3 [c000000255933ce0] .vfs_write at c0000000001c9dac
     #4 [c000000255933d80] .sys_write at c0000000001c9fd8
     #5 [c000000255933e30] syscall_exit at c000000000008564
     System Call [c00] exception frame:
     R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
     R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
     R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
     R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
     R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
     R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
     R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
     R30: 00000fffa55a0000    R31: 0000000000000002
     NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
     CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
     CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
     DSISR: 0000000042000000     Syscall Result: 0000000000000000


And then on a rhel7 traditional KDUMP dumpfile, both the panic task and the
non-panicking active tasks are missing the exception trace.  Here's a
sample
panic task backtrace using the current manner:

    PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
     #0 [c000000019823610] .crash_kexec at c0000000001725e0
     #1 [c000000019823810] .die at c000000000020a48
     #2 [c0000000198238c0] .bad_page_fault at c0000000000530d8
     #3 [c000000019823940] handle_page_fault at c000000000009584
     Data Access [300] exception frame:
     R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
     R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
     R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
     R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
     R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
     R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
     R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
     R30: c0000000011d8fa0    R31: 0000000000000002
     NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
     CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
     CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000
     #4 [c000000019823c30] .sysrq_handle_crash at c00000000055c3f8
     [Link Register] [c000000019823c30] .write_sysrq_trigger at
     c00000000055cfac
     #5 [c000000019823cf0] .proc_reg_write at c00000000037d120
     #6 [c000000019823d80] .sys_write at c0000000002d68e4
     #7 [c000000019823e30] syscall_exit at c00000000000a17c
     System Call [c00] exception frame:
     R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
     R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
     R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
     R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
     R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
     R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
     R30: 00003fffad680000    R31: 0000000000000002
     NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
     CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
     CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000

And here it is with your patch:

    PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
     R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
     R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
     R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
     R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
     R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
     R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
     R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
     R30: c0000000011d8fa0    R31: 0000000000000002
     NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
     CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
     CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000
     NIP [c00000000055c3f8] .sysrq_handle_crash
     LR  [c00000000055cfac] .write_sysrq_trigger
     #0 [c000000019823c30] .write_sysrq_trigger at c00000000055cf88
     #1 [c000000019823cf0] .proc_reg_write at c00000000037d120
     #2 [c000000019823d80] .sys_write at c0000000002d68e4
     #3 [c000000019823e30] syscall_exit at c00000000000a17c
     System Call [c00] exception frame:
     R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
     R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
     R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
     R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
     R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
     R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
     R30: 00003fffad680000    R31: 0000000000000002
     NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
     CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
     CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
     DSISR: 0000000042000000     Syscall Result: 0000000000000000

And from the same kdump, here's a non-panicking active task with the
current
way of doing things:

    PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
     #0 [c0000001dffdfb90] .crash_ipi_callback at c00000000004fd44
     #1 [c0000001dffdfc20] .smp_ipi_demux at c000000000046bf8
     #2 [c0000001dffdfcb0] .icp_hv_ipi_action at c000000000073454
     #3 [c0000001dffdfd30] .handle_irq_event_percpu at c0000000001afaa4
     #4 [c0000001dffdfe10] .handle_percpu_irq at c0000000001b526c
     #5 [c0000001dffdfe90] .generic_handle_irq at c0000000001aed1c
     #6 [c0000001dffdff10] .__do_irq at c000000000010d44
     #7 [c0000001dffdff90] .call_do_irq at c000000000023f60
     #8 [c00000000130b7e0] .do_IRQ at c000000000010eec
     #9 [c00000000130b880] hardware_interrupt_common at c000000000002614
     Hardware Interrupt [501] exception frame:
     R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
     R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
     R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
     R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
     R12: 0000000000000000    R13: c000000007b80000
     NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
     CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
     CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
     DSISR: c000000000157684     Syscall Result: 0000000000000000
    #10 [c00000000130bb70] .plpar_hcall_norets at c0000000000849b4
    [Link Register] [c00000000130bb70] .shared_cede_loop at c000000000710070
    #11 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
    #12 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
    #13 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
    #14 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
    #15 [c00000000130be80] .rest_init at c00000000000c5f4
    #16 [c00000000130bef0] .start_kernel at c000000000c34258
    #17 [c00000000130bf90] start_here_common at c000000000009b6c

and here with your patch applied:

    PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
     R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
     R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
     R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
     R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
     R12: 0000000000000000    R13: c000000007b80000
     NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
     CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
     CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
     DSISR: c000000000157684     Syscall Result: 0000000000000000
     NIP [c0000000000849b4] .plpar_hcall_norets
     LR  [c000000000710070] .shared_cede_loop
     #0 [c00000000130bb70] (null) at 3  (unreliable)
     #1 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
     #2 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
     #3 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
     #4 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
     #5 [c00000000130be80] .rest_init at c00000000000c5f4
     #6 [c00000000130bef0] .start_kernel at c000000000c34258
     #7 [c00000000130bf90] start_here_common at c000000000009b6c

Is that what you really want?

It would be unfortunate to lose all of that exception information, both
for the panic and for all of the non-panicking active tasks.
Hi Dave,

Unfortunate, yes. But I think the exception information we are going to
lose out would be related to either crash_ipi_callback, crash_kexec,
crash_fadump or some such which may not be significant in debugging?
At least, that was the assumption with which I posted this patch..
While it is true in the case of crash IPI callbacks, they are legitimate
parts of the trace, and it's worth "exercising" that backtrace path.  Have
you tested a crash that actually occurred while running on the hard or
soft IRQ stack?

Also, the exception frame doesn't even show the [bracketed] type of exception
that occurred -- it's just a register dump followed by the remainder of the
backtrace.  Upon a quick glance, it's not obvious that they are even active
tasks.  And traditionally, all of the other architectures have always dumped
a full trace.

I'm not sure what the mechanism is for shutting down the non-active
FADUMP tasks, so that's why I asked if you could restrict this change
to just those types of dumps.  (For that matter, is it even possible to
differentiate a real kdump from an FADUMP dumpfile --  aside from a

Hi Dave,

Differentiating a kdump and fadump dumpfile is not possible except that the
stack search would invariably fail and ptregs are guaranteed to be saved by
firmware in case of fadump. Posted v2 that doesn't change bt output for anything
but active tasks in case of fadump..

Thanks
Hari

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux