Re: Kernel Oops on alpha with kernel version >=6.9.x

"Maciej W. Rozycki" <macro@xxxxxxxxxxx> · Fri, 27 Dec 2024 16:30:50 +0000 (GMT)

On Fri, 27 Dec 2024, Magnus Lindholm wrote:

> The code generated by gcc from smp.c reserves 96 bytes of stack space
> but places csd_stack struct on $sp+79. Since sizeof csd_stack is 32
> bytes, it seems to me that [($sp+79) & NOT(0x1f) +
> sizeof(call_single_data_t)] might be greater than "96+$sp" if say, bit
> 3 and 4 are set in $sp? or am I missing something here?

 Umm, no.  The psABI guarantees 16-byte alignment for the stack pointer, 
and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be 
true (except for the overflow case, of course, which does not apply here).

> ---------------------------
> lda     sp,-96(sp)
> ...
> lda     s0,79(sp)
> ...
> andnot  s0,0x1f,s0
> ...
> stq zero,8(s0)
> stq zero,0(s0)
> stq     zero,16(s0)
> stq     zero,24(s0)
> stl     t0,8(s0) [.node =  CSD_FLAG_LOCK | CSD_TYPE_SYNC]

 So `csd_stack' isn't at ($old_sp + 79) but rather (($old_sp + 79) & ~31), 
which is just a way to guarantee 32-byte alignment for the data object 
where the psABI only guarantees 16-byte alignment for stack allocations.

> 2)
> Using  cacheline_aligned_in_smp when declaring csd_stack in
> smp_call_function_single will actually reserve less stack space (80
> bytes in stead of 96), csd_stack is referenced directly using $sp.
> Maybe alignment is just a way to simplify things for gcc and avoid
> hitting compiler bugs?

 For the Alpha port generic definitions are used, so I wouldn't draw such 
conclusions.  It's just how things work out by default, and objects of the 
`call_single_data_t' type are aligned to their size, which is 32, while 
____cacheline_aligned_in_smp requests alignment to either 32 or 64 bytes, 
depending on the kernel configuration...

> --------------------------
> lda     sp,-80(sp)
> stq     zero,48(sp)
> stq     zero,64(sp)
> stq     zero,72(sp)
> stl     t0,56(sp) [.node =  CSD_FLAG_LOCK | CSD_TYPE_SYNC]

... so I don't really know why it causes the alignment to be decreased 
here back to the psABI value of 16 bytes.

> I've made numerous attempts with different versions of GCC, including
> the most recent git version (with and without the patches from Maciej)
> and they give similar results, even though the exact amount of
> stackspace reserved, registers used, and placement of csd_stack struct
> will differ somewhat. (GCC) 15.0.0 20241225, with Maciej patches
> applied, will produce the code below:
> 
> lda     sp,-112(sp)
> lda     t1,47(sp)
> andnot  t1,0x1f,t1
> ...
> 
> Which boots and lets me load/unload my scsi kernel moduel, but just
> adding some debug print statement to smp_call_function_single will
> again give a kernel null pointer exception. Printing the value of &csd
> seems to allocate space for csd on the stack instead of keeping it in
> registers which will later trigger a null pointer excepting when
> accessed. To me it seems like this just moves around the stack
> clobbering problem?

 A code generation bug cannot be ruled out of course.  But it might be an 
ISO C compliance bug too that just happens to trigger for this particular 
scenario.

> CPU 1
> rmmod(1444): Oops 1
> pc = [<fffffc000078e818>]  ra = [<fffffc00003dd0f8>]  ps = 0000    Not tainted
> pc is at llist_add_batch+0x8/0x50
> ra is at __smp_call_single_queue+0x38/0xa0
> v0 = 0000000000000000  t0 = fffffc0000e2b100  t1 = fffffc0000ec4048
> t2 = 0000000000000000  t3 = fffffc0000ec4048  t4 = 0000000000000000
> t5 = 0000000000000001  t6 = ffffffffffffffec  t7 = fffffc0005d4c000
> s0 = 0000000000000000  s1 = 0000000000000001  s2 = 0000000000000001
> s3 = 0000000000000001  s4 = fffffc0000cd0330  s5 = fffffc000020ee80
> s6 = 00000200010422a0
> a0 = 0000000000000000  a1 = 0000000000000000  a2 = fffffc000020f100
> a3 = fffffc0005d4fa28  a4 = ffff1020ffffff00  a5 = 0000000000000000
> t8 = 0000000000000001  t9 = 0000000000000001  t10= 0000000000000000
> t11= 0000000000000000  pv = fffffc000078e810  at = 0000000000000000
> gp = fffffc0000e9c980  sp = 00000000905861a6
> Disabling lock debugging due to kernel taint
> Trace:
> [<fffffc00003dd1bc>] generic_exec_single+0x5c/0x150
> [<fffffc00003dd3ec>] smp_call_function_single+0x13c/0x220
> [<fffffc000082ceec>] device_release+0x3c/0xf0
> [<fffffc00003ae178>] rcu_barrier+0x1b8/0x4d0
> [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120
> [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120
> [<fffffc0000858418>] scsi_host_dev_release+0x58/0x170
> [<fffffc000082cf04>] device_release+0x54/0xf0
> [<fffffc0000b501f0>] kobject_put+0x90/0x1b0
> [<fffffc000082d0fc>] put_device+0x1c/0x30
> [<fffffc00008583ac>] scsi_host_put+0x1c/0x30
> [<fffffc00007b9694>] pci_device_remove+0x34/0x90
> [<fffffc0000838284>] device_remove+0x64/0xb0
> [<fffffc0000839d24>] device_release_driver_internal+0x284/0x370
> [<fffffc0000839ecc>] driver_detach+0x7c/0x110
> [<fffffc00008377e8>] bus_remove_driver+0x98/0x160
> [<fffffc000083a754>] driver_unregister+0x44/0xa0
> [<fffffc00007b94e8>] pci_unregister_driver+0x38/0xd0
> [<fffffc00003be264>] sys_delete_module+0x174/0x2f0
> [<fffffc000031095c>] entMM+0x9c/0xc0
> [<fffffc0000310d04>] entSys+0xa4/0xc0
> [<fffffc0000310d04>] entSys+0xa4/0xc0
> 
> Code:
>   f43ffffb
>   6bfa8001
>   47ff041f
>   2ffe0000
>   a4120000  <---  ldq     v0,0(a2) (*first = READ_ONCE(head->first);)
>   60004000  <---  mb
>  <b4110000> <---  stq     v0,0(a1) (new_last->next = first;)
>   60004000

 So `__smp_call_single_queue' is called with NULL `node'.

 Would you be able to trace it back further, e.g. by adding BUG_ON(!node)
to `__smp_call_single_queue' and so on if required, to see where this NULL 
pointer comes from originally?  I do hope such a minimal probe won't 
disturb code generation enough for this to become a heisenbug.

  Maciej