[Crash-utility] Re: [PATCH] Fix for "bt" command incorrectly printing "bogus exception frame" warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/03/26 17:25, lijiang wrote:
> On Tue, Mar 26, 2024 at 2:59 PM HAGIO KAZUHITO(萩尾 一仁) 
> <k-hagio-ab@xxxxxxx <mailto:k-hagio-ab@xxxxxxx>> wrote:
> 
>     On 2024/03/26 15:44, lijiang wrote:
>      > Thanks for the comment, Kazu.
>      >
>      > On Tue, Mar 26, 2024 at 10:28 AM HAGIO KAZUHITO(萩尾 一仁)
>      > <k-hagio-ab@xxxxxxx <mailto:k-hagio-ab@xxxxxxx>
>     <mailto:k-hagio-ab@xxxxxxx <mailto:k-hagio-ab@xxxxxxx>>> wrote:
>      >
>      >     Hi Lianbo,
>      >
>      >     thanks for the patch.
>      >
>      >     What is the kernel version of this vmcore?
>      >
>      >
>      > The kernel version is 5.14.0,  but I did not reproduce it, it
>     seems it's
>      > not easy to reproduce.
> 
>     I see, thanks.
> 
>     If it's a RHEL kernel, please let me know the release number e.g.
>     5.14.0-362.8.1.el9_3.x86_64 ?
> 
> 
> Not 8.1, it's the 5.14.0-362.2.1.el9_3.x86_64.
> 
>      >
>      >     and could I have "bt 0 -c 8 | tail -n 30" output?
>      >
>      > crash> bt 0 -c 8 | tail -n 30
> 
>     oh my bad, lack of "bt -r" option...
>     how about "bt 0 -c 8 -r | tail -n 30" ?
> 
> crash> bt 0 -c 8 -r | tail -n 30
> ffffbec3c022fe20:  0000000000000000 0000000000000000
> ffffbec3c022fe30:  ffff9948c08f6278 pick_next_task+82
> ffffbec3c022fe40:  ffffbec3c022fea0 0000000000000000
> ffffbec3c022fe50:  0000000000000000 __switch_to_asm+58
> ffffbec3c022fe60:  finish_task_switch+140 0000000000000000
> ffffbec3c022fe70:  ffff9948c08f5640 ffff9948e6f03980
> ffffbec3c022fe80:  0000000000000000 tick_nohz_next_event+90
> ffffbec3c022fe90:  ffff994c2f2a2ae0 0000000000000000
> ffffbec3c022fea0:  0000000000000000 0000000000000008
> ffffbec3c022feb0:  ct_kernel_enter.constprop.0+64 0000000000000046
> ffffbec3c022fec0:  read_tsc         ktime_get+56
> ffffbec3c022fed0:  0000000000000000 __flush_smp_call_function_queue+206
> ffffbec3c022fee0:  0000000000000286 ffff9948c08f5640
> ffffbec3c022fef0:  0000000000000046 0000000000000286
> ffffbec3c022ff00:  flush_smp_call_function_queue+72 0000000000000008
> ffffbec3c022ff10:  do_idle+168      0000000040000000
> ffffbec3c022ff20:  0000000000000094 cpu_startup_entry+25
> ffffbec3c022ff30:  0000000000000000 start_secondary+269
> ffffbec3c022ff40:  000000089726a2d0 e48885e126bc1600
> ffffbec3c022ff50:  secondary_startup_64_no_verify+229 0000000000000000
> ffffbec3c022ff60:  0000000000000000 0000000000000000
> ffffbec3c022ff70:  0000000000000000 0000000000000000
> ffffbec3c022ff80:  0000000000000000 0000000000000000
> ffffbec3c022ff90:  0000000000000000 0000000000000000
> ffffbec3c022ffa0:  0000000000000000 0000000000000000
> ffffbec3c022ffb0:  0000000000000000 0000000000000000
> ffffbec3c022ffc0:  0000000000000000 0000000000000000
> ffffbec3c022ffd0:  0000000000000000 0000000000000000
> ffffbec3c022ffe0:  0000000000000000 0000000000000000
> ffffbec3c022fff0:  0000000000000000 0000000000000000
> crash> >
>     Thanks,
>     Kazu
> 
>      >   #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
>      >      [exception RIP: __update_load_avg_se+13]
>      >      RIP: ffffffff9736b16d  RSP: ffffbec3c08acc78  RFLAGS: 00000046
>      >      RAX: 0000000000000000  RBX: ffff994c2f2b1a40  RCX:
>     ffffbec3c08acdc0
>      >      RDX: ffff9948e4fe1d80  RSI: ffff994c2f2b1a40  RDI:
>     0000001d7ad7d55d
>      >      RBP: ffffbec3c08acc88   R8: 0000001d921fca6f   R9:
>     ffff994c2f2b1328
>      >      R10: 00000000fffd0010  R11: ffffffff98e060c0  R12:
>     0000001d7ad7d55d
>      >      R13: 0000000000000005  R14: ffff994c2f2b19c0  R15:
>     0000000000000001
>      >      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>      > --- <NMI exception stack> ---
>      >   #5 [ffffbec3c08acc78] __update_load_avg_se at ffffffff9736b16d
>      >   #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
>      >   #7 [ffffbec3c08acd28] enqueue_task_fair at ffffffff9735cef8
>      >   #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
>      >   #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
>      > #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
>      > #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
>      > #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
>      > #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at ffffffff976a7b4c
>      > #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f [dm_mod]
>      > #15 [ffffbec3c08aced0] blk_update_request at ffffffff9779b49d
>      > #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
>      > #17 [ffffbec3c08acf58] scsi_io_completion at ffffffff97a3e606
>      > #18 [ffffbec3c08acf90] blk_complete_reqs at ffffffff977978d0
>      > #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
>      > #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
>      > --- <IRQ stack> ---
>      > #21 [ffffbec3c022ff28] cpu_startup_entry at ffffffff973684a9
>      > #22 [ffffbec3c022ff38] start_secondary at ffffffff9726a3dd
>      > #23 [ffffbec3c022ff50] secondary_startup_64_no_verify at
>     ffffffff9720015a
>      > crash>
>      >
>      >     If it's RHEL9, probably that do_softirq is called with this path.
>      >
>      >     cpu_startup_entry
>      >         do_idle
>      >           flush_smp_call_function_queue
>      >             do_softirq
>      >
>      >     but do_idle is skipped as below, I'd like to check just in case..
>      >
>      > Good question. I noticed the call trace, but this may be another
>     issue.

Thank you for the bt -r information.  Yes, it looks like they are 
skipped probably due to x86_64_irq_eframe_link, but I don't have a good 
idea for now.  Let's fix this first.

I've moved "do_softirq" first to be checked and applied.
https://github.com/crash-utility/crash/commit/ce47cb8dabb56c88e2d753026a9fdc83f83a5f5d

Thanks,
Kazu


>      > Thanks
>      > Lianbo
>      >
>      >       >    #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
>      >       >    --- <IRQ stack> ---
>      >       >    #21 [ffffbec3c022ff28] cpu_startup_entry at
>     ffffffff973684a9
>      >
>      >     Thanks,
>      >     Kazu
>      >
>      >
>      >     On 2024/03/19 16:59, Lianbo Jiang wrote:
>      >      > The "bogus exception frame" warning was observed again on
>     a specific
>      >      > vmcore, and the remaining frame was truncated on X86_64
>     machine, when
>      >      > executing the "bt" command as below:
>      >      >
>      >      >    crash> bt 0 -c 8
>      >      >    PID: 0        TASK: ffff9948c08f5640  CPU: 8    COMMAND:
>      >     "swapper/8"
>      >      >     #0 [fffffe1788788e58] crash_nmi_callback at
>     ffffffff972672bb
>      >      >     #1 [fffffe1788788e68] nmi_handle at ffffffff9722eb8e
>      >      >     #2 [fffffe1788788eb0] default_do_nmi at ffffffff97e51cd0
>      >      >     #3 [fffffe1788788ed0] exc_nmi at ffffffff97e51ee1
>      >      >     #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
>      >      >        [exception RIP: __update_load_avg_se+13]
>      >      >        RIP: ffffffff9736b16d  RSP: ffffbec3c08acc78 
>     RFLAGS: 00000046
>      >      >        RAX: 0000000000000000  RBX: ffff994c2f2b1a40  RCX:
>      >     ffffbec3c08acdc0
>      >      >        RDX: ffff9948e4fe1d80  RSI: ffff994c2f2b1a40  RDI:
>      >     0000001d7ad7d55d
>      >      >        RBP: ffffbec3c08acc88   R8: 0000001d921fca6f   R9:
>      >     ffff994c2f2b1328
>      >      >        R10: 00000000fffd0010  R11: ffffffff98e060c0  R12:
>      >     0000001d7ad7d55d
>      >      >        R13: 0000000000000005  R14: ffff994c2f2b19c0  R15:
>      >     0000000000000001
>      >      >        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>      >      >    --- <NMI exception stack> ---
>      >      >     #5 [ffffbec3c08acc78] __update_load_avg_se at
>     ffffffff9736b16d
>      >      >     #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
>      >      >     #7 [ffffbec3c08acd28] enqueue_task_fair at
>     ffffffff9735cef8
>      >      >     #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
>      >      >     #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
>      >      >    #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
>      >      >    #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
>      >      >    #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
>      >      >    #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at
>     ffffffff976a7b4c
>      >      >    #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f
>     [dm_mod]
>      >      >    #15 [ffffbec3c08aced0] blk_update_request at
>     ffffffff9779b49d
>      >      >    #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
>      >      >    #17 [ffffbec3c08acf58] scsi_io_completion at
>     ffffffff97a3e606
>      >      >    #18 [ffffbec3c08acf90] blk_complete_reqs at
>     ffffffff977978d0
>      >      >    #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
>      >      >    #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
>      >      >    --- <IRQ stack> ---
>      >      >    #21 [ffffbec3c022ff18] do_idle at ffffffff97368288
>      >      >        [exception RIP: unknown or invalid address]
>      >      >        RIP: 0000000000000000  RSP: 0000000000000000 
>     RFLAGS: 00000000
>      >      >        RAX: 0000000000000000  RBX: 000000089726a2d0  RCX:
>      >     0000000000000000
>      >      >        RDX: 0000000000000000  RSI: 0000000000000000  RDI:
>      >     0000000000000000
>      >      >        RBP: ffffffff9726a3dd   R8: 0000000000000000   R9:
>      >     0000000000000000
>      >      >        R10: ffffffff9720015a  R11: e48885e126bc1600  R12:
>      >     0000000000000000
>      >      >        R13: ffffffff973684a9  R14: 0000000000000094  R15:
>      >     0000000040000000
>      >      >        ORIG_RAX: 0000000000000000  CS: 0000  SS: 0000
>      >      >    bt: WARNING: possibly bogus exception frame
>      >      >    crash>
>      >      >
>      >      > Actually there is no exception frame, when called from
>     do_softirq().
>      >      > With the patch:
>      >      >
>      >      >    crash> bt 0 -c 8
>      >      >    PID: 0        TASK: ffff9948c08f5640  CPU: 8    COMMAND:
>      >     "swapper/8"
>      >      >     #0 [fffffe1788788e58] crash_nmi_callback at
>     ffffffff972672bb
>      >      >     #1 [fffffe1788788e68] nmi_handle at ffffffff9722eb8e
>      >      >     #2 [fffffe1788788eb0] default_do_nmi at ffffffff97e51cd0
>      >      >     #3 [fffffe1788788ed0] exc_nmi at ffffffff97e51ee1
>      >      >     #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
>      >      >        [exception RIP: __update_load_avg_se+13]
>      >      >        RIP: ffffffff9736b16d  RSP: ffffbec3c08acc78 
>     RFLAGS: 00000046
>      >      >        RAX: 0000000000000000  RBX: ffff994c2f2b1a40  RCX:
>      >     ffffbec3c08acdc0
>      >      >        RDX: ffff9948e4fe1d80  RSI: ffff994c2f2b1a40  RDI:
>      >     0000001d7ad7d55d
>      >      >        RBP: ffffbec3c08acc88   R8: 0000001d921fca6f   R9:
>      >     ffff994c2f2b1328
>      >      >        R10: 00000000fffd0010  R11: ffffffff98e060c0  R12:
>      >     0000001d7ad7d55d
>      >      >        R13: 0000000000000005  R14: ffff994c2f2b19c0  R15:
>      >     0000000000000001
>      >      >        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>      >      >    --- <NMI exception stack> ---
>      >      >     #5 [ffffbec3c08acc78] __update_load_avg_se at
>     ffffffff9736b16d
>      >      >     #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
>      >      >     #7 [ffffbec3c08acd28] enqueue_task_fair at
>     ffffffff9735cef8
>      >      >     #8 [ffffbec3c08acd60] enqueue_task at ffffffff973481fa
>      >      >     #9 [ffffbec3c08acd88] ttwu_do_activate at ffffffff9734aeed
>      >      >    #10 [ffffbec3c08acdb0] try_to_wake_up at ffffffff9734c7d7
>      >      >    #11 [ffffbec3c08ace08] __queue_work at ffffffff9732a4d2
>      >      >    #12 [ffffbec3c08ace50] queue_work_on at ffffffff9732a6a4
>      >      >    #13 [ffffbec3c08ace60] iomap_dio_bio_end_io at
>     ffffffff976a7b4c
>      >      >    #14 [ffffbec3c08ace90] clone_endio at ffffffffc090315f
>     [dm_mod]
>      >      >    #15 [ffffbec3c08aced0] blk_update_request at
>     ffffffff9779b49d
>      >      >    #16 [ffffbec3c08acf28] scsi_end_request at ffffffff97a3d5a7
>      >      >    #17 [ffffbec3c08acf58] scsi_io_completion at
>     ffffffff97a3e606
>      >      >    #18 [ffffbec3c08acf90] blk_complete_reqs at
>     ffffffff977978d0
>      >      >    #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
>      >      >    #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
>      >      >    --- <IRQ stack> ---
>      >      >    #21 [ffffbec3c022ff28] cpu_startup_entry at
>     ffffffff973684a9
>      >      >    #22 [ffffbec3c022ff38] start_secondary at ffffffff9726a3dd
>      >      >    #23 [ffffbec3c022ff50] secondary_startup_64_no_verify at
>      >     ffffffff9720015a
>      >      >    crash>
>      >      >
>      >      > Reported-by: Jie Li <jieli@xxxxxxxxxx
>     <mailto:jieli@xxxxxxxxxx> <mailto:jieli@xxxxxxxxxx
>     <mailto:jieli@xxxxxxxxxx>>>
>      >      > Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx
>     <mailto:lijiang@xxxxxxxxxx>
>      >     <mailto:lijiang@xxxxxxxxxx <mailto:lijiang@xxxxxxxxxx>>>
>      >      > ---
>      >      >   x86_64.c | 7 ++++---
>      >      >   1 file changed, 4 insertions(+), 3 deletions(-)
>      >      >
>      >      > diff --git a/x86_64.c b/x86_64.c
>      >      > index 502817d3b2bd..c672a0c3e8fc 100644
>      >      > --- a/x86_64.c
>      >      > +++ b/x86_64.c
>      >      > @@ -3841,11 +3841,12 @@ in_exception_stack:
>      >      >               up -= 1;
>      >      >                   bt->instptr = *up;
>      >      >               /*
>      >      > -              *  No exception frame when coming from
>      >     do_softirq_own_stack
>      >      > -              *  or call_softirq.
>      >      > +              *  No exception frame when coming from
>      >     do_softirq_own_stack,
>      >      > +              *  call_softirq or do_softirq.
>      >      >                */
>      >      >               if ((sp = value_search(bt->instptr, &offset)) &&
>      >      > -                 (STREQ(sp->name, "do_softirq_own_stack") ||
>      >     STREQ(sp->name, "call_softirq")))
>      >      > +                 (STREQ(sp->name, "do_softirq_own_stack") ||
>      >     STREQ(sp->name, "call_softirq")
>      >      > +                  || STREQ(sp->name, "do_softirq")))
>      >      >                       irq_eframe = 0;
>      >      >                   bt->frameptr = 0;
>      >      >                   done = FALSE;
>      > 
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux