Re: Kernel Oops on alpha with kernel version >=6.9.x

Magnus Lindholm <linmag7@xxxxxxxxx> · Wed, 4 Dec 2024 23:22:03 +0100

I've been looking a bit closer at the RCU problem on Alpha, in the
case with the bug
related to interface-renaming after the changes in the networking code
the code fails
with an invalid pointer reference. From the stack trace one can
conclude that this
happens when using synchronize_rcu_expedited() in stead of
synchronize_rcu_normal().
The use of rcu_normal can be enforced by setting kernel parameter
rcupdate.rcu_normal=1
at boot. This makes recent kernels boot again on my Alphas, a simple enough
workaround for now.

The code fails inside work-queue handler wait_rcu_exp_gp() when its
trying to call
rcu_exp_sel_wait_wake(). looking at the code generated from the
compiler the call
to rcu_exp_sel_wait_wake() appears to be inline-optimized, so no
actual call to this
function. If I add some bogus-code (i.e a print call that references
the address of a
local variable, something that the compiler can't optimize away)
before the call to
rcu_exp_sel_wait_wake(), the code works! The same effect is achieved
by declaring
the local variable as volatile.

I've also noted a similar behavior in the scsi driver code, where
unloading of a scsi
driver kernel module (in my case qla1280) will trigger a kernel Oops. As in the
example above, this can be mitigated by adding a reference to local variables.
When doing "rmmod qla1280" scsi_host_dev_release() calls rcu_barrier(). In this
function call I noticed that the stack was somehow corrupted and the
return address
to scsi_host_dev_release() was overwritten. The stack corruption occurs in the
"for_each_possible_cpu(cpu)" loop inside rcu_barrier(). Below are stack dumps
from before/after the for_each_possible_cpu loop. The call to
scsi_host_dev_release
disappears in stack trace since its return address (fffffc0000b6a3ec)
is replaced
by a '1' and at the of the call to rcu_barrier(). We get a kernel Oops
since the $ra=1 is used as return address.

In both RCU cases above, stack corruption occurs and the sections that cause
problems involve the use of kernel threads so concurrency might be an
issue here.
Since the RCU code works on other platforms and can be "fixed" on Alpha as well
just by declaring certain variables as volatile (or by other means
making sure that
they are not optimized away from the code) can this be a compiler issue on
alpha or is it the result of not taking proper measures, in the code,
to account for the
weak memory model on Alpha? Or a combination of the two?


/Magnus Lindholm


Stack traces showing the corrupted stack frames:
----------------------------------------------------------------

rcu: inside rcu_barrier 5
CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43
        fffffc000987fc88 fffffc0000e66440 fffffc00003a8bc8 0000000000000000
        fffffc0000e667b0 fffffc000480b5d8 fffffc0000b6a3ec fffffc0004a2a000
        fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068
        0000020001043480 00000200010422a0 0000000000000000 0000000000000000
        fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000
        fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000
 Trace:
 [<fffffc00003a8bc8>] rcu_barrier+0x1f8/0x580
 [<fffffc0000b6a3ec>] scsi_host_dev_release+0xac/0x1cc
 [<fffffc0000b68efc>] device_release+0x148/0x218
 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270
 [<fffffc00007cac3c>] put_device+0x1c/0x30
 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30
 [<fffffc00007554a4>] pci_device_remove+0x34/0x90
 [<fffffc00007d5c04>] device_remove+0x64/0xb0
 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380
 [<fffffc00007d783c>] driver_detach+0x7c/0x110
 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150
 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0
 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0
 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320
 [<fffffc0000310d34>] entSys+0xa4/0xc0


rcu: inside rcu_barrier 6
CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43
        fffffc000987fc88 fffffc0000e66440 fffffc00003a8c44 0000000000000002
        fffffc0000e667b0 fffffc0000e44240 0000000000000001 fffffc0004a2a000
        fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068
        0000020001043480 00000200010422a0 0000000000000000 0000000000000000
        fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000
        fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000
 Trace:
 [<fffffc00003a8c44>] rcu_barrier+0x274/0x580
 [<fffffc0000b68efc>] device_release+0x148/0x218
 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270
 [<fffffc00007cac3c>] put_device+0x1c/0x30
 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30
 [<fffffc00007554a4>] pci_device_remove+0x34/0x90
 [<fffffc00007d5c04>] device_remove+0x64/0xb0
 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380
 [<fffffc00007d783c>] driver_detach+0x7c/0x110
 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150
 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0
 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0
 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320
 [<fffffc0000310d34>] entSys+0xa4/0xc0


Unable to handle kernel paging request at virtual address 0000000000000000
CPU 1
rmmod(1430): Oops -1
 pc = [<0000000000000000>]  ra = [<0000000000000001>]  ps = 0000    Not tainted
 pc is at 0x0
 ra is at 0x1
 v0 = 0000000000000007  t0 = fffffc0000ec7aa8  t1 = ffffffffffffffff
 t2 = fffffc0000e65df0  t3 = 00000000000026f0  t4 = 00000000000028f1
 t5 = 00000000000c2e20  t6 = 00000000000c2e68  t7 = fffffc000987c000
 s0 = fffffc0004a2a000  s1 = fffffc0004a2a240  s2 = fffffc000480b5d8
 s3 = 0000000000000000  s4 = fffffffc00502068  s5 = 0000020001043480
 s6 = 00000200010422a0
 a0 = 0000000000000000  a1 = 0000000000000001  a2 = 00000000000028f0
 a3 = fffffc000987fa38  a4 = 0000000000000000  a5 = 0000000000000000
 t8 = 00000000000c2e20  t9 = ffffffffffffffec  t10= 0000000000000001
 t11= 00000001000024f0  pv = fffffc000038a1f0  at = 0000000000000000
 gp = fffffc0000eb7aa8  sp = 00000000183e6a07
 Disabling lock debugging due to kernel taint
 Trace:
 [<fffffc0000b68efc>] device_release+0x148/0x218
 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270
 [<fffffc00007cac3c>] put_device+0x1c/0x30
 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30
 [<fffffc00007554a4>] pci_device_remove+0x34/0x90
 [<fffffc00007d5c04>] device_remove+0x64/0xb0
 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380
 [<fffffc00007d783c>] driver_detach+0x7c/0x110
 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150
 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0
 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0
 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320
 [<fffffc0000310d34>] entSys+0xa4/0xc0


Below are the changes I made to the kernel source in order mitigate
the stack corruption problem
this is not really a fix but it can be of use to gain further
knowledge on whats really going on:
------------------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ff98233d4aa5..8241313404f7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4553,7 +4553,7 @@ static void rcu_barrier_handler(void *cpu_in)
  */
 void rcu_barrier(void)
 {
-       uintptr_t cpu;
+       volatile uintptr_t cpu;
        unsigned long flags;
        unsigned long gseq;
        struct rcu_data *rdp;
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index fb664d3a01c9..afba0ebc80e4 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -477,7 +477,7 @@ static inline void
sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
  */
 static void wait_rcu_exp_gp(struct kthread_work *wp)
 {
-       struct rcu_exp_work *rewp;
+       volatile struct rcu_exp_work *rewp;

        rewp = container_of(wp, struct rcu_exp_work, rew_work);
        rcu_exp_sel_wait_wake(rewp->rew_s);
@@ -705,6 +705,7 @@ static void rcu_exp_wait_wake(unsigned long s)
  */
 static void rcu_exp_sel_wait_wake(unsigned long s)
 {
+       pr_warn("inside rcu_exp_sel_wait_wake, %llx\n",(void*)s);
        /* Initialize the rcu_node tree in preparation for the wait. */
        sync_rcu_exp_select_cpus();






On Sun, Dec 1, 2024 at 6:04 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sun, Dec 01, 2024 at 11:09:10AM +0100, Magnus Lindholm wrote:
> > On Sun, Dec 1, 2024 at 5:31 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > > Does booting with the "rcupdate.rcu_normal=1" kernel boot parameter
> > > also suppress the problem?
> >
> > setting rcupdate.rcu_normal=1 also suppresses the problem. I guess this makes
> > RCU code not do synchronize_rcu_normal() in stead of the full
> > synchronize_rcu_expedited() which is where I get the kernel Oops.
>
> Exactly, though the effect is that any call to synchronize_rcu_expedited()
> instead results in a call to synchronize_rcu().
>
> Which means that you can work around this problem without having to
> carry patches and without having to slow down network configuration for
> everyone else.  ;-)
>
> > > That "pc =" down below is the program counter?  If so, I am at a loss
> > > as to what RCU could do to make it be zero.
> >
> > No sure why this happens, if the RCU code is passing around pointers to
> > worker function and this somehow ends up being a null pointer on the Alpha?
>
> Are frame pointers enabled on your setup?  If not, could you please
> enable them and reproduce the problem?  Could you also please try
> building and reproducing with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y?
>
>                                                         Thanx, Paul