Re: [PATCH] ppc64: fix 'bt' command for vmcore captured with fadump.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> Without this patch, backtraces of active tasks maybe be of the form
> "#0 [c0000000700b3a90] (null) at c0000000700b3b50  (unreliable)" for
> kernel dumps captured with fadump.  Trying to use ptregs saved for
> active tasks before falling back to stack-search method. Also, getting
> rid of warnings like "‘is_hugepage’ declared inline after being called".
> 
> Signed-off-by: Hari Bathini <hbathini@xxxxxxxxxxxxxxxxxx>

Hari,

I only have 1 sample vmcore generated by FADUMP, and I see that
the backtraces of the non-panicking active tasks are an improvement 
given that they show the exception frame register set.  However, I also
note that the panic task backtrace has changed, from this using the
current method:

  PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
   #0 [c000000255933620] .crash_fadump at c00000000002cbb8
   #1 [c0000002559336c0] .die at c000000000030dc8
   #2 [c000000255933770] .bad_page_fault at c000000000043748
   #3 [c0000002559337f0] handle_page_fault at c000000000005228
   Data Access [300] exception frame:
   R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628   
   R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff   
   R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8   
   R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590   
   R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788   
   R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414   
   R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063   
   R30: c000000000ec9208    R31: c000000001423aac   
   NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
   CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
   CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   #4 [c000000255933ae0] .sysrq_handle_crash at c00000000039c57c
   [Link Register] [c000000255933ae0] .__handle_sysrq at c00000000039c8c8
   #5 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
   #6 [c000000255933c30] .proc_reg_write at c000000000244874
   #7 [c000000255933ce0] .vfs_write at c0000000001c9dac
   #8 [c000000255933d80] .sys_write at c0000000001c9fd8
   #9 [c000000255933e30] syscall_exit at c000000000008564
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268   
   R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002   
   R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000   
   R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c   
   R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0   
   R30: 00000fffa55a0000    R31: 0000000000000002   
   NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
   CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
   CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
  
to this with your patch, where the exception backtrace is missing:

  PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
   R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628   
   R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff   
   R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8   
   R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590   
   R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788   
   R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414   
   R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063   
   R30: c000000000ec9208    R31: c000000001423aac   
   NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
   CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
   CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   NIP [c00000000039c57c] .sysrq_handle_crash
   LR  [c00000000039c8c8] .__handle_sysrq
   #0 [c000000255933ae0] .__handle_sysrq at c00000000039c89c
   #1 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
   #2 [c000000255933c30] .proc_reg_write at c000000000244874
   #3 [c000000255933ce0] .vfs_write at c0000000001c9dac
   #4 [c000000255933d80] .sys_write at c0000000001c9fd8
   #5 [c000000255933e30] syscall_exit at c000000000008564
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268   
   R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002   
   R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000   
   R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c   
   R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0   
   R30: 00000fffa55a0000    R31: 0000000000000002   
   NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
   CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
   CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
   DSISR: 0000000042000000     Syscall Result: 0000000000000000


  
And then on a rhel7 traditional KDUMP dumpfile, both the panic task and the 
non-panicking active tasks are missing the exception trace.  Here's a sample
panic task backtrace using the current manner:

  PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
   #0 [c000000019823610] .crash_kexec at c0000000001725e0
   #1 [c000000019823810] .die at c000000000020a48
   #2 [c0000000198238c0] .bad_page_fault at c0000000000530d8
   #3 [c000000019823940] handle_page_fault at c000000000009584
   Data Access [300] exception frame:
   R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780   
   R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8   
   R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780   
   R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f   
   R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170   
   R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530   
   R30: c0000000011d8fa0    R31: 0000000000000002   
   NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
   CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
   CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   #4 [c000000019823c30] .sysrq_handle_crash at c00000000055c3f8
   [Link Register] [c000000019823c30] .write_sysrq_trigger at c00000000055cfac
   #5 [c000000019823cf0] .proc_reg_write at c00000000037d120
   #6 [c000000019823d80] .sys_write at c0000000002d68e4
   #7 [c000000019823e30] syscall_exit at c00000000000a17c
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0   
   R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002   
   R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20   
   R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108   
   R30: 00003fffad680000    R31: 0000000000000002   
   NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
   CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
   CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000

And here it is with your patch:

  PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
   R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780   
   R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8   
   R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780   
   R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f   
   R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170   
   R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530   
   R30: c0000000011d8fa0    R31: 0000000000000002   
   NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
   CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
   CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   NIP [c00000000055c3f8] .sysrq_handle_crash
   LR  [c00000000055cfac] .write_sysrq_trigger
   #0 [c000000019823c30] .write_sysrq_trigger at c00000000055cf88
   #1 [c000000019823cf0] .proc_reg_write at c00000000037d120
   #2 [c000000019823d80] .sys_write at c0000000002d68e4
   #3 [c000000019823e30] syscall_exit at c00000000000a17c
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0   
   R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002   
   R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20   
   R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108   
   R30: 00003fffad680000    R31: 0000000000000002   
   NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
   CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
   CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000

And from the same kdump, here's a non-panicking active task with the current 
way of doing things:

  PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
   #0 [c0000001dffdfb90] .crash_ipi_callback at c00000000004fd44
   #1 [c0000001dffdfc20] .smp_ipi_demux at c000000000046bf8
   #2 [c0000001dffdfcb0] .icp_hv_ipi_action at c000000000073454
   #3 [c0000001dffdfd30] .handle_irq_event_percpu at c0000000001afaa4
   #4 [c0000001dffdfe10] .handle_percpu_irq at c0000000001b526c
   #5 [c0000001dffdfe90] .generic_handle_irq at c0000000001aed1c
   #6 [c0000001dffdff10] .__do_irq at c000000000010d44
   #7 [c0000001dffdff90] .call_do_irq at c000000000023f60
   #8 [c00000000130b7e0] .do_IRQ at c000000000010eec
   #9 [c00000000130b880] hardware_interrupt_common at c000000000002614
   Hardware Interrupt [501] exception frame:
   R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780   
   R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120   
   R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a   
   R12: 0000000000000000    R13: c000000007b80000   
   NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
   CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
   CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
   DSISR: c000000000157684     Syscall Result: 0000000000000000
  #10 [c00000000130bb70] .plpar_hcall_norets at c0000000000849b4
  [Link Register] [c00000000130bb70] .shared_cede_loop at c000000000710070
  #11 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
  #12 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
  #13 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
  #14 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
  #15 [c00000000130be80] .rest_init at c00000000000c5f4
  #16 [c00000000130bef0] .start_kernel at c000000000c34258
  #17 [c00000000130bf90] start_here_common at c000000000009b6c

and here with your patch applied:

  PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
   R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780   
   R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120   
   R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a   
   R12: 0000000000000000    R13: c000000007b80000   
   NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
   CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
   CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
   DSISR: c000000000157684     Syscall Result: 0000000000000000
   NIP [c0000000000849b4] .plpar_hcall_norets
   LR  [c000000000710070] .shared_cede_loop
   #0 [c00000000130bb70] (null) at 3  (unreliable)
   #1 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
   #2 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
   #3 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
   #4 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
   #5 [c00000000130be80] .rest_init at c00000000000c5f4
   #6 [c00000000130bef0] .start_kernel at c000000000c34258
   #7 [c00000000130bf90] start_here_common at c000000000009b6c

Is that what you really want?

It would be unfortunate to lose all of that exception information, both
for the panic and for all of the non-panicking active tasks. 

Would it be possible to only apply your changes to FADUMP dumpfiles?  
(and to possibly resurrect the missing exception backtrace for the FADUMP
panic task?)

Dave


--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux