> Umm, no. The psABI guarantees 16-byte alignment for the stack pointer, > and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be > true (except for the overflow case, of course, which does not apply here). aha! that explains it! thanks, is the psABI available somewhere? > > Would you be able to trace it back further, e.g. by adding BUG_ON(!node) > to `__smp_call_single_queue' and so on if required, to see where this NULL > pointer comes from originally? I do hope such a minimal probe won't > disturb code generation enough for this to become a heisenbug. > Hi, below are some additional test that I've made, It seems to me that part of the stack is overwritten with the values of other local variables. Previously this affected the return address on the stack causing kernels Oops on function return (see previous mail in thread). In this run it seems like when the pointer *csd in smp_call_function_single is stored on the stack, it gets overwritten by writes to csd_stack.info. The difference here is that I use GCC 15.0.0 20241225 (experimental) instead of gcc (Gentoo 14.2.1_p20241116 p3) 14.2.1. To me, this looks like the same problem but the clobbering just hits a different part of the stack. Below is some debug-output, where I've added some print statements to the code in smp_call_function_single of smp.c. When csd_stack is declared as "struct ____cacheline_aligned_in_smp __call_single_data csd_stack", this first case is the case where the code works: unloading the scsi module: -------------------------------------------------- smp: &csd_stack.info=fffffc000493fd90 &csd=fffffc000493fd98 smp: &csd_stack.info=fffffc000493fd90 &csd=fffffc000493fd98 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 rcu: rcu_barrier: cpu=1 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 rcu: rcu_barrier: cpu=2 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 smp: generic_exec_single: csd=fffffc000935bc38 cpu=2 smp_cpu=2 Below is the same debug output when csd_stack is declared as "call_single_data_t csd_stack" (i.e. no patch applied). For some reason, in this case, the address of the csd variable is the same as the address of csd_stack.info. If this is really the case, no wonder that a write to csd_stack.info will overwrite the csd pointer. In this case the code fails according to below: unloading the scsi module: ----------------------------------------- smp: &csd_stack.info=fffffc000493fd98 &csd=fffffc000493fd98 smp: smp_call_function_single: not wait smp_cpu=1 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc0006207c58 &csd=fffffc0006207c58 smp: generic_exec_single: csd=fffffc0006207c40 cpu=0 smp_cpu=0 Unable to handle kernel paging request at virtual address 0000000000000008 CPU 0 rmmod(1443): Oops 0 pc = [<fffffc00003dd564>] ra = [<fffffc00003dd558>] ps = 0000 Not tainted pc is at smp_call_function_single+0x204/0x220 ra is at smp_call_function_single+0x1f8/0x220 Below is yet another test, here the code works, csd_stack is declared as "call_single_data_t csd_stack" (i.e. no patch applied). In this example the code works since I've added some extra "dummy variables" on the stack which seems to steer things around enough. Here it's also clear that the address of csd does not overlap with the address of csd_stack.info. test0 and test1 are just the extra local variables that I've added. ----------------------------------------- smp: &csd_stack.info=fffffc000493fd78 &csd=fffffc000493fd90 smp: smp_call_function_single: not wait smp_cpu=1 smp: &test0=fffffc000493fd98 smp: &test1=fffffc000493fd88 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc0009e07c38 &csd=fffffc0009e07c50 smp: &test0=fffffc0009e07c58 smp: &test1=fffffc0009e07c48 smp: generic_exec_single: csd=fffffc0009e07c20 cpu=0 smp_cpu=0 Patch I used to "fix" kernel/smp.c ---------------------------------------------------- +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 @@ -631,7 +631,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait) { call_single_data_t *csd; - call_single_data_t csd_stack = { + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, }; int this_cpu; /Magnus