Re: [PATCH -next V17 4/7] riscv: entry: Convert to generic entry

Guo Ren <guoren@xxxxxxxxxx> · Sat, 1 Jul 2023 11:08:48 +0800



On Sat, Jul 1, 2023 at 10:55 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
>
> On Fri, Jun 30, 2023 at 10:51 PM Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
> >
> > On Fri, Jun 30, 2023 at 07:22:40AM -0400, Guo Ren wrote:
> > > On Fri, Jun 30, 2023 at 7:16 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 29, 2023 at 10:02 AM Daniel Thompson
> > > > <daniel.thompson@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Feb 21, 2023 at 10:30:18PM -0500, guoren@xxxxxxxxxx wrote:
> > > > > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > > > >
> > > > > > This patch converts riscv to use the generic entry infrastructure from
> > > > > > kernel/entry/*. The generic entry makes maintainers' work easier and
> > > > > > codes more elegant. Here are the changes:
> > > > > >
> > > > > >  - More clear entry.S with handle_exception and ret_from_exception
> > > > > >  - Get rid of complex custom signal implementation
> > > > > >  - Move syscall procedure from assembly to C, which is much more
> > > > > >    readable.
> > > > > >  - Connect ret_from_fork & ret_from_kernel_thread to generic entry.
> > > > > >  - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
> > > > > >  - Use the standard preemption code instead of custom
> > > > > >
> > > > > > Suggested-by: Huacai Chen <chenhuacai@xxxxxxxxxx>
> > > > > > Reviewed-by: Björn Töpel <bjorn@xxxxxxxxxxxx>
> > > > > > Tested-by: Yipeng Zou <zouyipeng@xxxxxxxxxx>
> > > > > > Tested-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > > > > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > > > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxx>
> > > > > > Cc: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
> > > > >
> > > > > Apologies for the late feedback but I've been swamped lately and only
> > > > > recently got round to running the full kgdb test suite on the v6.4
> > > > > series.
> > > > >
> > > > > The kgdb test suite includes a couple of tests that verify that the
> > > > > system resumes after breakpointing due to a BUG():
> > > > > https://github.com/daniel-thompson/kgdbtest/blob/master/tests/test_kdb_fault_injection.py#L24-L45
> > > > >
> > > > > These tests have regressed on riscv between v6.3 and v6.4 and a bisect
> > > > > is pointing at this patch. With these changes in place then, after kdb
> > > > > resumes the system, the BUG() message is printed as normal but then
> > > > > immediately fails. From the backtrace it looks like the new entry/exit
> > > > > code cannot advance past a compiled breakpoint instruction:
> > > > > ~~~
> > > > > PANIC: Fatal exception in interrupt
> > > > It comes from:
> > > > void die(struct pt_regs *regs, ...
> > > > {
> > > > ...
> > > > if (in_interrupt())
> > > >         panic("Fatal exception in interrupt");
> > > > ...
> > > >
> > > > We could add a dump_backtrace to see what happened:
> > > > if (in_interrupt()) {
> > > > +      dump_backtrace(regs, NULL, KERN_DEFAULT);
> > > Sorry, it should be:
> > > +        dump_backtrace(NULL, NULL, KERN_DEFAULT);
> > > We need current stack info, not exception context.
> >
> > I added this... and I also stopped kgdb from intercepting the panic()
> > since that interferes with the console output from dump_backtrace().
> >
> > ~~~
> > # /bin/echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
> > [    3.380565] lkdtm: Performing direct entry BUG
> >
> > Entering kdb (current=0xff6000000380ab00, pid 98) on processor 0 due to NonMaskable Interrupt @ 0xffffffff8064b844
> > kdb> go
> > Catastrophic error detected
> > kdb_continue_catastrophic=0, type go a second time if you really want to continue
> > kdb> go
> > Catastrophic error detected
> > kdb_continue_catastrophic=0, attempting to continue
> > [    3.381411] ------------[ cut here ]------------
> > [    3.381454] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
> > [    3.381609] Kernel BUG [#1]
> > [    3.381632] Modules linked in:
> > [    3.381734] CPU: 0 PID: 98 Comm: echo Not tainted 6.4.0-rc6-00004-ge6e9d4598760-dirty #126
> > [    3.381817] Hardware name: riscv-virtio,qemu (DT)
> > [    3.381885] epc : lkdtm_BUG+0x6/0x8
> > [    3.381959]  ra : lkdtm_do_action+0x10/0x1c
> > [    3.381978] epc : ffffffff8064b844 ra : ffffffff8064afb4 sp : ff200000008c3d30
> > [    3.381991]  gp : ffffffff810665a0 tp : ff6000000380ab00 t0 : 6500000000000000
> > [    3.382002]  t1 : 0000000000000001 t2 : 6550203a6d74646b s0 : ff200000008c3d40
> > [    3.382012]  s1 : ff60000003988000 a0 : ffffffff80fc0260 a1 : ff6000003ffad788
> > [    3.382023]  a2 : ff6000003ffb9530 a3 : 0000000000000000 a4 : 0000000000000000
> > [    3.382034]  a5 : ffffffff8064b83e a6 : 0000000000000050 a7 : 0000000000040000
> > [    3.382045]  s2 : 0000000000000004 s3 : ffffffff80fc0260 s4 : ff200000008c3e70
> > [    3.382056]  s5 : ff600000033223a8 s6 : 00000000000f0cc0 s7 : ff60000002211000
> > [    3.382066]  s8 : 00ffffffafc50c08 s9 : 00ffffffafc4b9b8 s10: 0000000000000000
> > [    3.382077]  s11: 0000000000000001 t3 : 461f715700000000 t4 : 0000000000000002
> > [    3.382087]  t5 : 0000000000000000 t6 : ff200000008c3b58
> > [    3.382097] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> > [    3.382139] [<ffffffff8064b844>] lkdtm_BUG+0x6/0x8
> > [    3.382245] Code: 0513 9245 b097 0039 80e7 7f20 bf39 1141 e422 0800 (9002) 1141
> > [    3.594697] ---[ end trace 0000000000000000 ]---
> >
> > At this point we expect a shell prompt since we should have taken the BUG(),
> > killed the echo process and returned to the shell. However in v6.4 we get the
> > following instead (including the instrumentation you asked for):
>
> After comparing with arm64, I found that arm64 uses spinlock_irq to
> protect the in_interrupt(). I think this would make in_interrupt() =
> 0.
>
> So how about trying:
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 5158961ea977..0ac914a99ee3 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -82,13 +82,15 @@ void die(struct pt_regs *regs, const char *str)
>
>         bust_spinlocks(0);
>         add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
> -       spin_unlock_irqrestore(&die_lock, flags);
>         oops_exit();
>
>         if (in_interrupt())
>                 panic("Fatal exception in interrupt");
>         if (panic_on_oops)
>                 panic("Fatal exception");
> +
> +       spin_unlock_irqrestore(&die_lock, flags);
En... It seems it's not correct, how can I reproduce your environment
on qemu? Sorry, I'm not familiar with kgdb.

> +
>         if (ret != NOTIFY_STOP)
>                 make_task_dead(SIGSEGV);
>  }
>
> >
> > [    3.594801] [<ffffffff80005e3a>] dump_backtrace+0x1c/0x24
> > [    3.594826] [<ffffffff800059f0>] die+0x228/0x238
> > [    3.594835] [<ffffffff80005b38>] handle_break+0x9a/0xe0
> > [    3.594843] [<ffffffff809f30d6>] do_trap_break+0x48/0x5c
> > [    3.594854] [<ffffffff80003ee4>] ret_from_exception+0x0/0x64
> > [    3.594862] [<ffffffff8064b844>] lkdtm_BUG+0x6/0x8
> > [    3.594959] Kernel panic - not syncing: Fatal exception in interrupt
> > [    3.595005] SMP: stopping secondary CPUs
> > [    3.596444] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > ~~~
> >
> >
> > Daniel.
>
>
>
> --
> Best Regards
>  Guo Ren


-- 
Best Regards
 Guo Ren