Hi again. I've been running some more tests, this time with a smp kernel but on a system with just one cpu, seems to me as a bit simpler scenario to analyze. I've added some print statements to smp_call_function_single, just to see what's really going on: pr_warn("smp_call_function_single: %llx %llx size=%d\n",&csd_stack,&csd, sizeof(call_single_data_t)); output is seen below: smp: smp_call_function_single: fffffc000493fc40 fffffc000493fc58 size=32 so, the csd_stack struct is 32-bytes in size but &csd - &csd_stack = 24. This does not make any sense? pr_warn("\n&csd_stack.info=%lx\n&csd=%lx\n",&csd_stack.info,&csd); output according to below: smp: &csd_stack.info=fffffc000493fc58 &csd=fffffc000493fc58 Here csd variable has the same address on the stack as csd_stack.info. Using above information and locking at the disassembly of smp_call_function_single in smp.o I've put together the following table mapping out the stack of smp_call_function_single: $sp+0 ra $sp+8 s0 $sp+16 $sp+24 $sp+32 csd_stack.node 0xfffffc000493fc40 $sp+40 csd_stack.node 0xfffffc000493fc48 $sp+48 csd_stack.func 0xfffffc000493fc50 $sp+56 csd_stack.info 0xfffffc000493fc58 $sp+64 csd 0xfffffc000493fc58 $sp+72 $sp+80 a3 $sp+88 a2 $sp+96 a0 $sp+104 a1 $sp+112 - When requesting csd_stack to be aligned using __attribute__((__aligned__(x))) it seems as if the compiler does not leave enough room above the csd_stack struct. i.e since the exact location of csd_stack depends on the actual value of $sp it is not known at compile time. Seems like gcc does not take this into account. The code works fine if I remove the alignment attribute for csd_stack. Also as previously mentioned, declaring csd_stack as "struct ____cacheline_aligned_in_smp" makes it work, but judging from the disassembly code, this statement has no effect on the alignment of csd_stack, i.e csd_stack is not aligned to anything its simply just placed on the stack, indirectly making it just 16-byte aligned instead of the requested 32-byte alignment. It seems to me that, when used to align variables that reside on the stack, __attribute__((__aligned__(x))) does not work correctly with gcc/alpha/linux. /Magnus On Tue, Dec 31, 2024 at 11:43 AM Magnus Lindholm <linmag7@xxxxxxxxx> wrote: > > > Umm, no. The psABI guarantees 16-byte alignment for the stack pointer, > > and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be > > true (except for the overflow case, of course, which does not apply here). > > aha! that explains it! thanks, is the psABI available somewhere? > > > > > Would you be able to trace it back further, e.g. by adding BUG_ON(!node) > > to `__smp_call_single_queue' and so on if required, to see where this NULL > > pointer comes from originally? I do hope such a minimal probe won't > > disturb code generation enough for this to become a heisenbug. > > > Hi, below are some additional test that I've made, > > It seems to me that part of the stack is overwritten with the values > of other local variables. Previously this affected the return address > on the stack causing kernels Oops on function return (see previous > mail in thread). In this run it seems like when the pointer *csd in > smp_call_function_single is stored on the stack, it gets overwritten > by writes to csd_stack.info. The difference here is that I use GCC > 15.0.0 20241225 (experimental) instead of gcc (Gentoo 14.2.1_p20241116 > p3) 14.2.1. To me, this looks like the same problem but the clobbering > just hits a different part of the stack. Below is some debug-output, > where I've added some print statements to the code in > smp_call_function_single of smp.c. When csd_stack is declared as > "struct ____cacheline_aligned_in_smp __call_single_data csd_stack", > this first case is the case where the code works: > > unloading the scsi module: > -------------------------------------------------- > smp: > &csd_stack.info=fffffc000493fd90 > &csd=fffffc000493fd98 > smp: > &csd_stack.info=fffffc000493fd90 > &csd=fffffc000493fd98 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > rcu: rcu_barrier: cpu=1 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > rcu: rcu_barrier: cpu=2 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > smp: generic_exec_single: csd=fffffc000935bc38 cpu=2 smp_cpu=2 > > > > Below is the same debug output when csd_stack is declared as > "call_single_data_t csd_stack" (i.e. no patch applied). For some > reason, in this case, the address of the csd variable is the same as > the address of csd_stack.info. If this is really the case, no wonder > that a write to csd_stack.info will overwrite the csd pointer. In this > case the code fails according to below: > > unloading the scsi module: > ----------------------------------------- > smp: > &csd_stack.info=fffffc000493fd98 > &csd=fffffc000493fd98 > smp: smp_call_function_single: not wait smp_cpu=1 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc0006207c58 > &csd=fffffc0006207c58 > smp: generic_exec_single: csd=fffffc0006207c40 cpu=0 smp_cpu=0 > Unable to handle kernel paging request at virtual address 0000000000000008 > CPU 0 > rmmod(1443): Oops 0 > pc = [<fffffc00003dd564>] ra = [<fffffc00003dd558>] ps = 0000 Not tainted > pc is at smp_call_function_single+0x204/0x220 > ra is at smp_call_function_single+0x1f8/0x220 > > > > > Below is yet another test, here the code works, csd_stack is declared > as "call_single_data_t csd_stack" (i.e. no patch applied). In this > example the code works since I've added some extra "dummy variables" > on the stack which seems to steer things around enough. Here it's also > clear that the address of csd does not overlap with the address of > csd_stack.info. test0 and test1 are just the extra local variables > that I've added. > > ----------------------------------------- > smp: > &csd_stack.info=fffffc000493fd78 > &csd=fffffc000493fd90 > smp: smp_call_function_single: not wait smp_cpu=1 > smp: &test0=fffffc000493fd98 > smp: &test1=fffffc000493fd88 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc0009e07c38 > &csd=fffffc0009e07c50 > smp: &test0=fffffc0009e07c58 > smp: &test1=fffffc0009e07c48 > smp: generic_exec_single: csd=fffffc0009e07c20 cpu=0 smp_cpu=0 > > > > > Patch I used to "fix" kernel/smp.c > ---------------------------------------------------- > +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 > @@ -631,7 +631,7 @@ > int smp_call_function_single(int cpu, smp_call_func_t func, void *info, > int wait) > { > call_single_data_t *csd; > - call_single_data_t csd_stack = { > + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { > .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, > }; > int this_cpu; > > > > /Magnus