On Wed, Dec 18, 2024 at 08:33:09PM +0100, Magnus Lindholm wrote: > Hi again, > > So, various versions of GCC can trigger/untrigger the RCU bugs that > I've been hitting. This may of course still be due to GCC miscompiling > some code on alpha but that said I've taken another look at the code > involved when this is triggered. From what I've seen so far, this > happens when kernel threads are used in the code to access structured > data stored on the stack (structs). The quote below is from a comment > in the kernel code (include/linux/percpu-defs.h) > > "s390 and alpha modules require percpu variables to be defined as weak > to force the compiler to generate GOT based external references for > them. This is necessary because percpu sections will be located > outside of the usually addressable area. This definition puts the > following two extra restrictions when defining percpu variables. > > 1. The symbol must be globally unique, even the static ones. > > 2. Static percpu variables cannot be defined inside a function." > > > Taking these notes into account I've made some minor modifications to > the code (briefly described below). The modifications involve > declaring some structs previously placed on stack as > DEFINE_PER_CPU_SHARED_ALIGNED. This is already done for some strucs in > rcu/smp code. Making these modifications fixes the problem and I can > build the kernel with GCC versions that previously triggered the bug. > The same approach fixed both the network interface-rename bug as well > as the scsi module unload bug. > > in kernel/smp/smp.c > ------------------- > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > smp_call_function_single(...) use csd = this_cpu_ptr(&csd_data) > regardless if its called with wait = 0 or 1. Make sure to declare > csd_data as "DEFINE_PER_CPU_SHARED_ALIGNED" > #endif If I understand your proposed changes correctly, this could increase the amount of time CPUs spend waiting for for the csd_data to become free. This would be a very unwelcome development in some quarters. > in kernel/rcu/tree.c > -------------------- > #use rew_data instead of stack allocated struct > static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_exp_work, rew_data); I am not finding this one. > in kernel/rcu/tree_exp.h > ------------------------ > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > void synchronize_rcu_expedited(void) > > in stead of: > struct rcu_exp_work rew; > > do: > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > struct rcu_exp_work *rewp; > rewp=this_cpu_ptr(&rew_data); > > rew_data is declared "DEFINE_PER_CPU_SHARED_ALIGNED" > > #endif OK, I *can* find this one, thank you. At least assuming you mean the local variable "rew" defined in synchronize_rcu_expedited(). I don't see how this relates to the per-CPU variable restrictions called out above relates here. We do not have any sort of per-CPU variable here, but instead a simple structure allocated on the stack. And a relatively small structure at that. In addition, making this be a per-CPU variable will break if a given CPU invokes a second synchronize_rcu_expedited() before its first call to synchronize_rcu_expedited() returns. You will end up with the second call clobbering that per-CPU variable, which will likely fatally confuse the workqueue when attempting to manage the resulting workqueue handlers. You might need heavy load to make this happen, but I don't see anything preventing it from happening. So what am I missing here? Thanx, Paul