Re: [PATCH -next V17 4/7] riscv: entry: Convert to generic entry

Daniel Thompson <daniel.thompson@xxxxxxxxxx> · Fri, 30 Jun 2023 15:50:56 +0100

On Fri, Jun 30, 2023 at 07:22:40AM -0400, Guo Ren wrote:
> On Fri, Jun 30, 2023 at 7:16 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 29, 2023 at 10:02 AM Daniel Thompson
> > <daniel.thompson@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 21, 2023 at 10:30:18PM -0500, guoren@xxxxxxxxxx wrote:
> > > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > >
> > > > This patch converts riscv to use the generic entry infrastructure from
> > > > kernel/entry/*. The generic entry makes maintainers' work easier and
> > > > codes more elegant. Here are the changes:
> > > >
> > > >  - More clear entry.S with handle_exception and ret_from_exception
> > > >  - Get rid of complex custom signal implementation
> > > >  - Move syscall procedure from assembly to C, which is much more
> > > >    readable.
> > > >  - Connect ret_from_fork & ret_from_kernel_thread to generic entry.
> > > >  - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
> > > >  - Use the standard preemption code instead of custom
> > > >
> > > > Suggested-by: Huacai Chen <chenhuacai@xxxxxxxxxx>
> > > > Reviewed-by: Björn Töpel <bjorn@xxxxxxxxxxxx>
> > > > Tested-by: Yipeng Zou <zouyipeng@xxxxxxxxxx>
> > > > Tested-by: Jisheng Zhang <jszhang@xxxxxxxxxx>
> > > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > > Signed-off-by: Guo Ren <guoren@xxxxxxxxxx>
> > > > Cc: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
> > >
> > > Apologies for the late feedback but I've been swamped lately and only
> > > recently got round to running the full kgdb test suite on the v6.4
> > > series.
> > >
> > > The kgdb test suite includes a couple of tests that verify that the
> > > system resumes after breakpointing due to a BUG():
> > > https://github.com/daniel-thompson/kgdbtest/blob/master/tests/test_kdb_fault_injection.py#L24-L45
> > >
> > > These tests have regressed on riscv between v6.3 and v6.4 and a bisect
> > > is pointing at this patch. With these changes in place then, after kdb
> > > resumes the system, the BUG() message is printed as normal but then
> > > immediately fails. From the backtrace it looks like the new entry/exit
> > > code cannot advance past a compiled breakpoint instruction:
> > > ~~~
> > > PANIC: Fatal exception in interrupt
> > It comes from:
> > void die(struct pt_regs *regs, ...
> > {
> > ...
> > if (in_interrupt())
> >         panic("Fatal exception in interrupt");
> > ...
> >
> > We could add a dump_backtrace to see what happened:
> > if (in_interrupt()) {
> > +      dump_backtrace(regs, NULL, KERN_DEFAULT);
> Sorry, it should be:
> +        dump_backtrace(NULL, NULL, KERN_DEFAULT);
> We need current stack info, not exception context.

I added this... and I also stopped kgdb from intercepting the panic()
since that interferes with the console output from dump_backtrace().

~~~
# /bin/echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
[    3.380565] lkdtm: Performing direct entry BUG

Entering kdb (current=0xff6000000380ab00, pid 98) on processor 0 due to NonMaskable Interrupt @ 0xffffffff8064b844
kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to continue
kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, attempting to continue
[    3.381411] ------------[ cut here ]------------
[    3.381454] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
[    3.381609] Kernel BUG [#1]
[    3.381632] Modules linked in:
[    3.381734] CPU: 0 PID: 98 Comm: echo Not tainted 6.4.0-rc6-00004-ge6e9d4598760-dirty #126
[    3.381817] Hardware name: riscv-virtio,qemu (DT)
[    3.381885] epc : lkdtm_BUG+0x6/0x8
[    3.381959]  ra : lkdtm_do_action+0x10/0x1c
[    3.381978] epc : ffffffff8064b844 ra : ffffffff8064afb4 sp : ff200000008c3d30
[    3.381991]  gp : ffffffff810665a0 tp : ff6000000380ab00 t0 : 6500000000000000
[    3.382002]  t1 : 0000000000000001 t2 : 6550203a6d74646b s0 : ff200000008c3d40
[    3.382012]  s1 : ff60000003988000 a0 : ffffffff80fc0260 a1 : ff6000003ffad788
[    3.382023]  a2 : ff6000003ffb9530 a3 : 0000000000000000 a4 : 0000000000000000
[    3.382034]  a5 : ffffffff8064b83e a6 : 0000000000000050 a7 : 0000000000040000
[    3.382045]  s2 : 0000000000000004 s3 : ffffffff80fc0260 s4 : ff200000008c3e70
[    3.382056]  s5 : ff600000033223a8 s6 : 00000000000f0cc0 s7 : ff60000002211000
[    3.382066]  s8 : 00ffffffafc50c08 s9 : 00ffffffafc4b9b8 s10: 0000000000000000
[    3.382077]  s11: 0000000000000001 t3 : 461f715700000000 t4 : 0000000000000002
[    3.382087]  t5 : 0000000000000000 t6 : ff200000008c3b58
[    3.382097] status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[    3.382139] [<ffffffff8064b844>] lkdtm_BUG+0x6/0x8
[    3.382245] Code: 0513 9245 b097 0039 80e7 7f20 bf39 1141 e422 0800 (9002) 1141
[    3.594697] ---[ end trace 0000000000000000 ]---

At this point we expect a shell prompt since we should have taken the BUG(),
killed the echo process and returned to the shell. However in v6.4 we get the
following instead (including the instrumentation you asked for):

[    3.594801] [<ffffffff80005e3a>] dump_backtrace+0x1c/0x24
[    3.594826] [<ffffffff800059f0>] die+0x228/0x238
[    3.594835] [<ffffffff80005b38>] handle_break+0x9a/0xe0
[    3.594843] [<ffffffff809f30d6>] do_trap_break+0x48/0x5c
[    3.594854] [<ffffffff80003ee4>] ret_from_exception+0x0/0x64
[    3.594862] [<ffffffff8064b844>] lkdtm_BUG+0x6/0x8
[    3.594959] Kernel panic - not syncing: Fatal exception in interrupt
[    3.595005] SMP: stopping secondary CPUs
[    3.596444] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
~~~

Daniel.