On Thu, Nov 03, 2022 at 09:13:13PM +0800, Pingfan Liu wrote: > Clarify at first: > This issue is totally detected by code suspicion, not a real experience. > > Scene: > > __srcu_read_(un)lock() uses percpu variable srcu_(un)lock_count[2]. > Normally, the percpu can help avoid the non-atomic RMW issue, but in > some rare cases, it can not. > > Supposing that __srcu_read_lock() runs on cpuX, the statement > this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > can be decomposed into two sub group: > -1. get the address of this_cpu_ptr(ssp->sda)->srcu_lock_count[idx], > denoted as addressX and let unsigned long *pX = addressX; > -2. *pX = *pX + 1; It's not supposed to happen: * The weak version of this_cpu_inc() disables interrupts during the whole. * x86 adds directly to gs/fs memory * arm64, loongarch, s390 disable preemption This has to be a fundamental constraint of this_cpu_*() ops implementation. Thanks. > > Now, assuming there are two tasks, denoted as taskA and taskB, three > cpus, denoted as cpuX, cpuY and cpuZ. Let both taskA and taskB finish > the first step as above on cpuX, but migrate to cpuY and cpuZ > individually just before the second step. Then both of them continue to > execute concurrently from the second step. This will raise a typical > non-atomic RMW issue. > > Solution: > > This issue can be tackled by disable preemption around the two sub > groups. > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > To: rcu@xxxxxxxxxxxxxxx > --- > kernel/rcu/srcutree.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 1c304fec89c0..a38a3779dc01 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -636,7 +636,11 @@ int __srcu_read_lock(struct srcu_struct *ssp) > int idx; > > idx = READ_ONCE(ssp->srcu_idx) & 0x1; > + __preempt_count_inc(); > + barrier(); > this_cpu_inc(ssp->sda->srcu_lock_count[idx]); > + barrier(); > + __preempt_count_dec(); > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > return idx; > } > @@ -650,7 +654,11 @@ EXPORT_SYMBOL_GPL(__srcu_read_lock); > void __srcu_read_unlock(struct srcu_struct *ssp, int idx) > { > smp_mb(); /* C */ /* Avoid leaking the critical section. */ > + __preempt_count_inc(); > + barrier(); > this_cpu_inc(ssp->sda->srcu_unlock_count[idx]); > + barrier(); > + __preempt_count_dec(); > } > EXPORT_SYMBOL_GPL(__srcu_read_unlock); > > -- > 2.31.1 >