Re: [PATCH] ppc64: fix 'bt' command for vmcore captured with fadump.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thursday 19 January 2017 02:05 AM, Dave Anderson wrote:

----- Original Message -----
Without this patch, backtraces of active tasks maybe be of the form
"#0 [c0000000700b3a90] (null) at c0000000700b3b50  (unreliable)" for
kernel dumps captured with fadump.  Trying to use ptregs saved for
active tasks before falling back to stack-search method. Also, getting
rid of warnings like "‘is_hugepage’ declared inline after being called".

Signed-off-by: Hari Bathini <hbathini@xxxxxxxxxxxxxxxxxx>
Hari,

I only have 1 sample vmcore generated by FADUMP, and I see that
the backtraces of the non-panicking active tasks are an improvement
given that they show the exception frame register set.  However, I also
note that the panic task backtrace has changed, from this using the
current method:

   PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
    #0 [c000000255933620] .crash_fadump at c00000000002cbb8
    #1 [c0000002559336c0] .die at c000000000030dc8
    #2 [c000000255933770] .bad_page_fault at c000000000043748
    #3 [c0000002559337f0] handle_page_fault at c000000000005228
    Data Access [300] exception frame:
    R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
    R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
    R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
    R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
    R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
    R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
    R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
    R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
    R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
    R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
    R30: c000000000ec9208    R31: c000000001423aac
    NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
    CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
    CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000
    #4 [c000000255933ae0] .sysrq_handle_crash at c00000000039c57c
    [Link Register] [c000000255933ae0] .__handle_sysrq at c00000000039c8c8
    #5 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
    #6 [c000000255933c30] .proc_reg_write at c000000000244874
    #7 [c000000255933ce0] .vfs_write at c0000000001c9dac
    #8 [c000000255933d80] .sys_write at c0000000001c9fd8
    #9 [c000000255933e30] syscall_exit at c000000000008564
    System Call [c00] exception frame:
    R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
    R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
    R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
    R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
    R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
    R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
    R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
    R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
    R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
    R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
    R30: 00000fffa55a0000    R31: 0000000000000002
    NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
    CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
    CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
    DSISR: 0000000042000000     Syscall Result: 0000000000000000
to this with your patch, where the exception backtrace is missing:

   PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
    R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
    R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
    R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
    R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
    R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
    R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
    R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
    R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
    R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
    R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
    R30: c000000000ec9208    R31: c000000001423aac
    NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
    CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
    CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000
    NIP [c00000000039c57c] .sysrq_handle_crash
    LR  [c00000000039c8c8] .__handle_sysrq
    #0 [c000000255933ae0] .__handle_sysrq at c00000000039c89c
    #1 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
    #2 [c000000255933c30] .proc_reg_write at c000000000244874
    #3 [c000000255933ce0] .vfs_write at c0000000001c9dac
    #4 [c000000255933d80] .sys_write at c0000000001c9fd8
    #5 [c000000255933e30] syscall_exit at c000000000008564
    System Call [c00] exception frame:
    R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
    R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
    R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
    R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
    R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
    R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
    R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
    R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
    R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
    R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
    R30: 00000fffa55a0000    R31: 0000000000000002
    NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
    CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
    CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
    DSISR: 0000000042000000     Syscall Result: 0000000000000000


And then on a rhel7 traditional KDUMP dumpfile, both the panic task and the
non-panicking active tasks are missing the exception trace.  Here's a sample
panic task backtrace using the current manner:

   PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
    #0 [c000000019823610] .crash_kexec at c0000000001725e0
    #1 [c000000019823810] .die at c000000000020a48
    #2 [c0000000198238c0] .bad_page_fault at c0000000000530d8
    #3 [c000000019823940] handle_page_fault at c000000000009584
    Data Access [300] exception frame:
    R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
    R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
    R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
    R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
    R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
    R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
    R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
    R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
    R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
    R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
    R30: c0000000011d8fa0    R31: 0000000000000002
    NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
    CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
    CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000
    #4 [c000000019823c30] .sysrq_handle_crash at c00000000055c3f8
    [Link Register] [c000000019823c30] .write_sysrq_trigger at c00000000055cfac
    #5 [c000000019823cf0] .proc_reg_write at c00000000037d120
    #6 [c000000019823d80] .sys_write at c0000000002d68e4
    #7 [c000000019823e30] syscall_exit at c00000000000a17c
    System Call [c00] exception frame:
    R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
    R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
    R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
    R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
    R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
    R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
    R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
    R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
    R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
    R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
    R30: 00003fffad680000    R31: 0000000000000002
    NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
    CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
    CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000

And here it is with your patch:

   PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
    R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
    R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
    R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
    R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
    R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
    R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
    R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
    R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
    R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
    R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
    R30: c0000000011d8fa0    R31: 0000000000000002
    NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
    CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
    CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000
    NIP [c00000000055c3f8] .sysrq_handle_crash
    LR  [c00000000055cfac] .write_sysrq_trigger
    #0 [c000000019823c30] .write_sysrq_trigger at c00000000055cf88
    #1 [c000000019823cf0] .proc_reg_write at c00000000037d120
    #2 [c000000019823d80] .sys_write at c0000000002d68e4
    #3 [c000000019823e30] syscall_exit at c00000000000a17c
    System Call [c00] exception frame:
    R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
    R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
    R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
    R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
    R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
    R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
    R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
    R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
    R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
    R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
    R30: 00003fffad680000    R31: 0000000000000002
    NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
    CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
    CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
    DSISR: 0000000042000000     Syscall Result: 0000000000000000

And from the same kdump, here's a non-panicking active task with the current
way of doing things:

   PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
    #0 [c0000001dffdfb90] .crash_ipi_callback at c00000000004fd44
    #1 [c0000001dffdfc20] .smp_ipi_demux at c000000000046bf8
    #2 [c0000001dffdfcb0] .icp_hv_ipi_action at c000000000073454
    #3 [c0000001dffdfd30] .handle_irq_event_percpu at c0000000001afaa4
    #4 [c0000001dffdfe10] .handle_percpu_irq at c0000000001b526c
    #5 [c0000001dffdfe90] .generic_handle_irq at c0000000001aed1c
    #6 [c0000001dffdff10] .__do_irq at c000000000010d44
    #7 [c0000001dffdff90] .call_do_irq at c000000000023f60
    #8 [c00000000130b7e0] .do_IRQ at c000000000010eec
    #9 [c00000000130b880] hardware_interrupt_common at c000000000002614
    Hardware Interrupt [501] exception frame:
    R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
    R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
    R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
    R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
    R12: 0000000000000000    R13: c000000007b80000
    NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
    CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
    CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
    DSISR: c000000000157684     Syscall Result: 0000000000000000
   #10 [c00000000130bb70] .plpar_hcall_norets at c0000000000849b4
   [Link Register] [c00000000130bb70] .shared_cede_loop at c000000000710070
   #11 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
   #12 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
   #13 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
   #14 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
   #15 [c00000000130be80] .rest_init at c00000000000c5f4
   #16 [c00000000130bef0] .start_kernel at c000000000c34258
   #17 [c00000000130bf90] start_here_common at c000000000009b6c

and here with your patch applied:

   PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
    R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
    R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
    R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
    R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
    R12: 0000000000000000    R13: c000000007b80000
    NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
    CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
    CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
    DSISR: c000000000157684     Syscall Result: 0000000000000000
    NIP [c0000000000849b4] .plpar_hcall_norets
    LR  [c000000000710070] .shared_cede_loop
    #0 [c00000000130bb70] (null) at 3  (unreliable)
    #1 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
    #2 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
    #3 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
    #4 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
    #5 [c00000000130be80] .rest_init at c00000000000c5f4
    #6 [c00000000130bef0] .start_kernel at c000000000c34258
    #7 [c00000000130bf90] start_here_common at c000000000009b6c

Is that what you really want?

It would be unfortunate to lose all of that exception information, both
for the panic and for all of the non-panicking active tasks.

Hi Dave,

Unfortunate, yes. But I think the exception information we are going to
lose out would be related to either crash_ipi_callback, crash_kexec,
crash_fadump or some such which may not be significant in debugging?
At least, that was the assumption with which I posted this patch..

Thanks
Hari

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux