Re: Kernel Oops on alpha with kernel version >=6.9.x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 18, 2024 at 08:33:09PM +0100, Magnus Lindholm wrote:
> Hi again,
> 
> So, various versions of GCC can trigger/untrigger the RCU bugs that
> I've been hitting. This may of course still be due to GCC miscompiling
> some code on alpha but that said I've taken another look at the code
> involved when this is triggered. From what I've seen so far, this
> happens when kernel threads are used in the code to access structured
> data stored on the stack (structs). The quote below is from a comment
> in the kernel code (include/linux/percpu-defs.h)
> 
> "s390 and alpha modules require percpu variables to be defined as weak
> to force the compiler to generate GOT based external references for
> them. This is necessary because percpu sections will be located
> outside of the usually addressable area. This definition puts the
> following two extra restrictions when defining percpu variables.
> 
> 1. The symbol must be globally unique, even the static ones.
> 
> 2. Static percpu variables cannot be defined inside a function."
> 
> 
> Taking these notes into account I've made some minor modifications to
> the code (briefly described below). The modifications involve
> declaring some structs previously placed on stack as
> DEFINE_PER_CPU_SHARED_ALIGNED. This is already done for some strucs in
> rcu/smp code. Making these modifications fixes the problem and I can
> build the kernel with GCC versions that previously triggered the bug.
> The same approach fixed both the network interface-rename bug as well
> as the scsi module unload bug.
> 
> in kernel/smp/smp.c
> -------------------
> #if  defined(ARCH_NEEDS_WEAK_PER_CPU)
> smp_call_function_single(...) use csd = this_cpu_ptr(&csd_data)
> regardless if its called with wait = 0 or 1. Make sure to declare
> csd_data as "DEFINE_PER_CPU_SHARED_ALIGNED"
> #endif

If I understand your proposed changes correctly, this could increase
the amount of time CPUs spend waiting for for the csd_data to become free.
This would be a very unwelcome development in some quarters.

> in kernel/rcu/tree.c
> --------------------
> #use rew_data instead of stack allocated struct
> static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_exp_work, rew_data);

I am not finding this one.

> in kernel/rcu/tree_exp.h
> ------------------------
> #if  defined(ARCH_NEEDS_WEAK_PER_CPU)
> void synchronize_rcu_expedited(void)
> 
> in stead of:
> struct rcu_exp_work rew;
> 
> do:
> #if  defined(ARCH_NEEDS_WEAK_PER_CPU)
> struct rcu_exp_work *rewp;
> rewp=this_cpu_ptr(&rew_data);
> 
> rew_data is declared "DEFINE_PER_CPU_SHARED_ALIGNED"
> 
> #endif

OK, I *can* find this one, thank you.  At least assuming you mean the
local variable "rew" defined in synchronize_rcu_expedited().

I don't see how this relates to the per-CPU variable restrictions called
out above relates here.  We do not have any sort of per-CPU variable here,
but instead a simple structure allocated on the stack.  And a relatively
small structure at that.

In addition, making this be a per-CPU variable will break if a given
CPU invokes a second synchronize_rcu_expedited() before its first call
to synchronize_rcu_expedited() returns.  You will end up with the second
call clobbering that per-CPU variable, which will likely fatally confuse
the workqueue when attempting to manage the resulting workqueue handlers.
You might need heavy load to make this happen, but I don't see anything
preventing it from happening.

So what am I missing here?

							Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux