Re: [PATCH] Fix for "bt" command prints a bogus warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/02/06 23:43, lijiang wrote:
>>>> On Mon, Feb 6, 2023 at 2:19 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@xxxxxxx <mailto:k-hagio-ab@xxxxxxx>> wrote:
>>>>
>>>>
>>>>     On 2023/02/06 11:04, lijiang wrote:
>>>>
>>>>     >> I tried to hardcode and get the expected eframe information, but the backtrace is incorrect:
>>>>
>>>>     ok, it looks the expected one.  The irq_eframe_link value has been
>>>>     hardcoded for a long time, it might be hard to calculate it in the
>>>>     current implementation..
>>>>
>>>>     btw, what is the kernel version of your vmcore this case?
>>>>
>>>>
>>>> My kernel version is upstream kernel 5.7.

The kernel 5.7 doesn't have fa5e5c409213, so the patch is added?

$ git describe --contains fa5e5c409213
v5.8-rc1~21^2~43

>>>> For my side, the irq_eframe_link is set to -56 as below:

I see, thanks for trying.

>>>>
>>>> diff --git a/x86_64.c b/x86_64.c
>>>> index 7a5d6f050c89..61ecf8dbdb65 100644
>>>> --- a/x86_64.c
>>>> +++ b/x86_64.c
>>>> @@ -3938,6 +3938,11 @@ in_exception_stack:
>>>>          if (irq_eframe) {
>>>>                  bt->flags |= BT_EXCEPTION_FRAME;
>>>>                  i = (irq_eframe - bt->stackbase)/sizeof(ulong);
>>>> +               if (symbol_exists("asm_common_interrupt")) {
>>>> +                       i -= 1;
>>>> +                       up = (ulong *)(&bt->stackbuf[i*sizeof(ulong)]);
>>>> +                       bt->instptr = *up;
>>>> +               }
>>>>                  x86_64_print_stack_entry(bt, ofp, level, i, bt->instptr);
>>>>                  bt->flags &= ~(ulonglong)BT_EXCEPTION_FRAME;
>>>>                  cs = x86_64_exception_frame(EFRAME_PRINT|EFRAME_CS, 0,
>>>> @@ -6521,6 +6526,11 @@ x86_64_irq_eframe_link_init(void)
>>>>         else
>>>>                 return;
>>>>
>>>> +       if (symbol_exists("asm_common_interrupt")) {
>>>> +               machdep->machspec->irq_eframe_link = -56;
>>>> +               return;
>>>> +       }
>>>> +
>>>>         if (THIS_KERNEL_VERSION < LINUX(2,6,9))
>>>>                 return;
>>>>
>>>> And get the following results:
>>>> crash> bt
>>>> PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
>>>>  #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
>>>>  #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
>>>>  #2 [ffffba7900118cd0] panic at ffffffff87edfff9
>>>>  #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
>>>>  #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
>>>>  #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
>>>>  #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
>>>>  #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
>>>>  #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
>>>>  #9 [ffffba7900118e50] input_event at ffffffff884fe74b
>>>> #10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
>>>> #11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
>>>> #12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
>>>> #13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
>>>> #14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
>>>> #15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
>>>> #16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
>>>> #17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
>>>> --- <IRQ stack> ---
>>>> #18 [ffffba790008be08] asm_common_interrupt at ffffffff88800c1e
>>>>     [exception RIP: default_idle+37]
>>>>     RIP: ffffffff8876ffc5  RSP: ffffba790008beb8  RFLAGS: 00000246
>>>>     RAX: ffffffff8876ffa0  RBX: 0000000000000003  RCX: 0000000000000001
>>>>     RDX: 000000000001a6de  RSI: 0000000000000087  RDI: 0000000000000003
>>>>     RBP: 0000000000000003   R8: 000000146ccbc1f0   R9: 0000000000000000
>>>>     R10: 0000000000000000  R11: 0000000000000124  R12: ffffffffffffffff
>>>>     R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
>>>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>> #19 [ffffba790008bed8] do_idle at ffffffff87f1982d
>>>> #20 [ffffba790008bf20] cpu_startup_entry at ffffffff87f19a29
>>>> #21 [ffffba790008bf30] start_secondary at ffffffff87e5118f
>>>> #22 [ffffba790008bf50] secondary_startup_64 at ffffffff87e000e6
>>>>
>>>> The above eframe information looks correct, but the backtrace seems problematic, it doesn't display the
>>>> common_interrupt() according to the kernel log:

yes, it's expected.  The current crash unwinder prints only one return
address with the exception frame.  Probably we can show them more, but
more important thing is how we determine the irq_eframe_link.

I will also think about it, could I have some outputs for reference?

crash> rd ffffba7900118fd0 4

crash> dis asm_common_interrupt
crash> dis common_interrupt
crash> dis asm_call_on_stack

crash> bt -D -5 -I asm_common_interrupt+30
crash> bt -D -5 -I common_interrupt+181
crash> bt -D -5 -I ffffffff88800fa2

Thanks,
Kazu

>>>>
>>>> [   57.552655] Call Trace:
>>>> [   57.553994]  <IRQ>
>>>> [   57.555143]  dump_stack+0x57/0x70
>>>> [   57.556930]  panic+0xfb/0x2d7
>>>> [   57.558516]  ? printk+0x58/0x6f
>>>> [   57.560194]  sysrq_handle_crash+0x11/0x20
>>>> [   57.562371]  __handle_sysrq.cold.15+0x48/0x102
>>>> [   57.564663]  sysrq_filter+0x312/0x3c0
>>>> [   57.566579]  input_to_handler+0x4f/0xf0
>>>> [   57.568583]  input_pass_values.part.7+0x117/0x130
>>>> [   57.571188]  input_handle_event+0x138/0x5c0
>>>> [   57.573380]  input_event+0x4b/0x70
>>>> [   57.575173]  atkbd_interrupt+0x61f/0x6a0
>>>> [   57.577228]  serio_interrupt+0x46/0x90
>>>> [   57.579192]  i8042_interrupt+0x154/0x240
>>>> [   57.581328]  __handle_irq_event_percpu+0x40/0x170
>>>> [   57.583788]  handle_irq_event_percpu+0x30/0x80
>>>> [   57.586081]  handle_irq_event+0x36/0x53
>>>> [   57.588071]  handle_edge_irq+0x82/0x190
>>>> [   57.590096]  asm_call_on_stack+0x12/0x20
>>>> [   57.592142]  </IRQ>
>>>> [   57.593264]  common_interrupt+0xb5/0x140
>>>> [   57.595311]  asm_common_interrupt+0x1e/0x40
>>>> [   57.597474] RIP: 0010:default_idle+0x25/0x150
>>>> [   57.599735] Code: cc cc cc cc cc 66 66 66 66 90 41 55 41 54 55 65 8b 2d af 23 8a 77 53 66 66 66 66 90 e9 07 00 00 00 0f 00 2d 9f 37 49 00 fb f4 <65> 8b 2d 94 23 8a 77 66 66 66 66 90 5b 5d 41 5c 41 5d c3 65 8b 05
>>>> [   57.609196] RSP: 0018:ffffba790008beb8 EFLAGS: 00000246
>>>> [   57.611875] RAX: ffffffff8876ffa0 RBX: 0000000000000003 RCX: 0000000000000001
>>>> [   57.615506] RDX: 000000000001a6de RSI: 0000000000000087 RDI: 0000000000000003
>>>> [   57.619142] RBP: 0000000000000003 R08: 000000146ccbc1f0 R09: 0000000000000000
>>>> [   57.622770] R10: 0000000000000000 R11: 0000000000000124 R12: ffffffffffffffff
>>>> [   57.626386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>> [   57.630029]  ? __cpuidle_text_start+0x8/0x8
>>>> [   57.632209]  do_idle+0x1bd/0x240
>>>> [   57.633914]  cpu_startup_entry+0x19/0x20
>>>> [   57.635966]  start_secondary+0x15f/0x1b0
>>>> [   57.638021]  secondary_startup_64+0xb6/0xc0
>>>> crash>
>>>>
>>>> But I can see it in the raw stack data:
>>>>
>>>> ffffba790008bdc0:  0000000000000001 0000000000000001
>>>> ffffba790008bdd0: common_interrupt+181 0000000000000000  <---- Why is it not in the backtrace?
>>>> ffffba790008bde0:  0000000000000000 0000000000000000
>>>> ffffba790008bdf0:  0000000000000000 0000000000000000
>>>> ffffba790008be00: asm_common_interrupt+30 0000000000000000
>>>> ffffba790008be10:  0000000000000000 0000000000000000
>>>> ffffba790008be20:  ffffffffffffffff 0000000000000003
>>>> ffffba790008be30:  0000000000000003 0000000000000124
>>>> ffffba790008be40:  0000000000000000 0000000000000000
>>>> ffffba790008be50:  000000146ccbc1f0 default_idle
>>>> ffffba790008be60:  0000000000000001 000000000001a6de
>>>> ffffba790008be70:  0000000000000087 0000000000000003
>>>> ffffba790008be80:  ffffffffffffffff default_idle+37
>>>> ffffba790008be90:  0000000000000010 0000000000000246
>>>> ffffba790008bea0:  ffffba790008beb8 0000000000000018
>>>> ffffba790008beb0:  0000000000000003 0000000000000003
>>>> ffffba790008bec0:  __cpu_online_mask ffffffffffffffff
>>>> ffffba790008bed0:  0000000000000000 do_idle+445
>>>> ffffba790008bee0:  ffffffffffffff04 c4a71aecbd899c00
>>>> ffffba790008bef0:  000000000000008f 0000000000000000
>>>> ffffba790008bf00:  0000000000000000 0000000000000000
>>>> ffffba790008bf10:  0000000000000000 0000000000000000
>>>> ffffba790008bf20:  cpu_startup_entry+25 ffffba790008bf38
>>>> ffffba790008bf30:  start_secondary+351 c4a71aecbd899c00
>>>> ffffba790008bf40:  0000000000000000 0000000000000000
>>>> ffffba790008bf50:  secondary_startup_64+182 0000000000000000
>>>>
>>>>
>>>>     And for RHEL9, this trial patch looks good, how does this work for
>>>>     your vmcore?
>>>>     https://github.com/k-hagio/crash/commit/0719360a2b76b3c03b1f7ad06caf2ce14e2bfcbf.patch <https://github.com/k-hagio/crash/commit/0719360a2b76b3c03b1f7ad06caf2ce14e2bfcbf.patch>
>>>>
>>>> Unfortunately, it still doesn't work(with the above patch):
>>>>
>>>> crash> bt
>>>> PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
>>>>  #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
>>>>  #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
>>>>  #2 [ffffba7900118cd0] panic at ffffffff87edfff9
>>>>  #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
>>>>  #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
>>>>  #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
>>>>  #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
>>>>  #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
>>>>  #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
>>>>  #9 [ffffba7900118e50] input_event at ffffffff884fe74b
>>>> #10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
>>>> #11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
>>>> #12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
>>>> #13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
>>>> #14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
>>>> #15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
>>>> #16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
>>>> #17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
>>>> --- <IRQ stack> ---
>>>>     RIP: 000000000001a6de  RSP: ffffffffffffffff  RFLAGS: 00000003
>>>>     RAX: 0000000000000124  RBX: 0000000000000000  RCX: 0000000000000000
>>>>     RDX: 0000000000000000  RSI: 000000146ccbc1f0  RDI: ffffffff8876ffa0
>>>>     RBP: 0000000000000000   R8: 0000000000000003   R9: 0000000000000003
>>>>     R10: ffffffffffffffff  R11: 0000000000000000  R12: ffffffff88800c1e
>>>>     R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
>>>>     ORIG_RAX: 0000000000000001  CS: 0087  SS: ffffffff8876ffc5
>>>> bt: WARNING: possibly bogus exception frame
>>>> crash>
>>>>
>>>> Thanks.
>>>> Lianbo
>>>>
>>>>     Thanks,
>>>>     Kazu
>>>>
>>>>     >>
>>>>     >> crash> bt
>>>>     >> PID: 0        TASK: ffff9e7a47e32f00  CPU: 3    COMMAND: "swapper/3"
>>>>     >>  #0 [ffffba7900118bb8] machine_kexec at ffffffff87e5c2c7
>>>>     >>  #1 [ffffba7900118c08] __crash_kexec at ffffffff87f9500d
>>>>     >>  #2 [ffffba7900118cd0] panic at ffffffff87edfff9
>>>>     >>  #3 [ffffba7900118d50] sysrq_handle_crash at ffffffff883ce2c1
>>>>     >>  #4 [ffffba7900118d58] __handle_sysrq.cold.15 at ffffffff883ceb56
>>>>     >>  #5 [ffffba7900118d88] sysrq_filter at ffffffff883ce9a2
>>>>     >>  #6 [ffffba7900118dc0] input_to_handler at ffffffff884fb0bf
>>>>     >>  #7 [ffffba7900118df8] input_pass_values at ffffffff884fc1b7
>>>>     >>  #8 [ffffba7900118e20] input_handle_event at ffffffff884fe278
>>>>     >>  #9 [ffffba7900118e50] input_event at ffffffff884fe74b
>>>>     >> #10 [ffffba7900118e88] atkbd_interrupt at ffffffff88504e2f
>>>>     >> #11 [ffffba7900118ee0] serio_interrupt at ffffffff884f7516
>>>>     >> #12 [ffffba7900118f10] i8042_interrupt at ffffffff884f8b04
>>>>     >> #13 [ffffba7900118f50] __handle_irq_event_percpu at ffffffff87f51430
>>>>     >> #14 [ffffba7900118f90] handle_irq_event_percpu at ffffffff87f51590
>>>>     >> #15 [ffffba7900118fb8] handle_irq_event at ffffffff87f51616
>>>>     >> #16 [ffffba7900118fd8] handle_edge_irq at ffffffff87f559f2
>>>>     >> #17 [ffffba7900118ff0] asm_call_on_stack at ffffffff88800fa2
>>>>     >> --- <IRQ stack> ---
>>>>     >> #18 [ffffba790008be08] asm_call_on_stack at ffffffff88800fa2
>>>>     >>     [exception RIP: default_idle+37]
>>>>     >>     RIP: ffffffff8876ffc5  RSP: ffffba790008beb8  RFLAGS: 00000246
>>>>     >>     RAX: ffffffff8876ffa0  RBX: 0000000000000003  RCX: 0000000000000001
>>>>     >>     RDX: 000000000001a6de  RSI: 0000000000000087  RDI: 0000000000000003
>>>>     >>     RBP: 0000000000000003   R8: 000000146ccbc1f0   R9: 0000000000000000
>>>>     >>     R10: 0000000000000000  R11: 0000000000000124  R12: ffffffffffffffff
>>>>     >>     R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
>>>>     >>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>     >> #19 [ffffba790008bed8] do_idle at ffffffff87f1982d
>>>>     >> #20 [ffffba790008bf20] cpu_startup_entry at ffffffff87f19a29
>>>>     >> #21 [ffffba790008bf30] start_secondary at ffffffff87e5118f
>>>>     >> #22 [ffffba790008bf50] secondary_startup_64 at ffffffff87e000e6
>>>>     >> crash>>> Let me debug it more and to see what happened.
>>>>     >>
>>>>     >>     What is printed as the panic message in "log" command?
>>>>     >>
>>>>     >>
>>>>     >> The kernel panic message:
>>>>     >> [   57.542538] Kernel panic - not syncing: sysrq triggered crash
>>>>     >> [   57.545512] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.7.0+ #7
>>>>     >> [   57.549160] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
>>>>     >> [   57.552655] Call Trace:
>>>>     >> [   57.553994]  <IRQ>
>>>>     >> [   57.555143]  dump_stack+0x57/0x70
>>>>     >> [   57.556930]  panic+0xfb/0x2d7
>>>>     >> [   57.558516]  ? printk+0x58/0x6f
>>>>     >> [   57.560194]  sysrq_handle_crash+0x11/0x20
>>>>     >> [   57.562371]  __handle_sysrq.cold.15+0x48/0x102
>>>>     >> [   57.564663]  sysrq_filter+0x312/0x3c0
>>>>     >> [   57.566579]  input_to_handler+0x4f/0xf0
>>>>     >> [   57.568583]  input_pass_values.part.7+0x117/0x130
>>>>     >> [   57.571188]  input_handle_event+0x138/0x5c0
>>>>     >> [   57.573380]  input_event+0x4b/0x70
>>>>     >> [   57.575173]  atkbd_interrupt+0x61f/0x6a0
>>>>     >> [   57.577228]  serio_interrupt+0x46/0x90
>>>>     >> [   57.579192]  i8042_interrupt+0x154/0x240
>>>>     >> [   57.581328]  __handle_irq_event_percpu+0x40/0x170
>>>>     >> [   57.583788]  handle_irq_event_percpu+0x30/0x80
>>>>     >> [   57.586081]  handle_irq_event+0x36/0x53
>>>>     >> [   57.588071]  handle_edge_irq+0x82/0x190
>>>>     >> [   57.590096]  asm_call_on_stack+0x12/0x20
>>>>     >> [   57.592142]  </IRQ>
>>>>     >> [   57.593264]  common_interrupt+0xb5/0x140
>>>>     >> [   57.595311]  asm_common_interrupt+0x1e/0x40
>>>>     >> [   57.597474] RIP: 0010:default_idle+0x25/0x150
>>>>     >> [   57.599735] Code: cc cc cc cc cc 66 66 66 66 90 41 55 41 54 55 65 8b 2d af 23 8a 77 53 66 66 66 66 90 e9 07 00 00 00 0f 00 2d 9f 37 49 00 fb f4 <65> 8b 2d 94 23 8a 77 66 66 66 66 90 5b 5d 41 5c 41 5d c3 65 8b 05
>>>>     >> [   57.609196] RSP: 0018:ffffba790008beb8 EFLAGS: 00000246
>>>>     >> [   57.611875] RAX: ffffffff8876ffa0 RBX: 0000000000000003 RCX: 0000000000000001
>>>>     >> [   57.615506] RDX: 000000000001a6de RSI: 0000000000000087 RDI: 0000000000000003
>>>>     >> [   57.619142] RBP: 0000000000000003 R08: 000000146ccbc1f0 R09: 0000000000000000
>>>>     >> [   57.622770] R10: 0000000000000000 R11: 0000000000000124 R12: ffffffffffffffff
>>>>     >> [   57.626386] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>>     >> [   57.630029]  ? __cpuidle_text_start+0x8/0x8
>>>>     >> [   57.632209]  do_idle+0x1bd/0x240
>>>>     >> [   57.633914]  cpu_startup_entry+0x19/0x20
>>>>     >> [   57.635966]  start_secondary+0x15f/0x1b0
>>>>     >> [   57.638021]  secondary_startup_64+0xb6/0xc0
>>>>     >> Thanks.
>>>>     >> Lianbo
>>>>     >>
>>>>
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux