On Tue, Sep 05, 2023 at 09:34:45AM -0400, Mathieu Desnoyers wrote: > On 9/5/23 09:26, Paul E. McKenney wrote: > > On Tue, Sep 05, 2023 at 08:26:51AM -0400, Mathieu Desnoyers wrote: > > > On 9/5/23 05:31, David Laight wrote: > > > > From: Mathieu Desnoyers > > > > > Sent: 04 September 2023 11:24 > > > > > > > > > > On 9/4/23 05:42, Denis Arefev wrote: > > > > > > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo) > > > > > > is subject to overflow due to a failure to cast operands to a larger > > > > > > data type before performing arithmetic > > > > > > > > > > The patch title should identify more precisely its context, e.g.: > > > > > > > > > > "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems" > > > > > > > > > > Also, as I stated in my reply to the previous version, the patch commit > > > > > message should describe the impact of the bug it fixes. > > > > > > > > And is the analysis complete? > > > > Is 1UL right for 32bit archs?? > > > > Is 64 bits even enough?? > > > > > > I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig > > > RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is sufficient to > > > hold the mask on 32-bit architectures, and a 64-bit integer is enough on > > > 64-bit architectures given the current implementation. > > > > > > At least this appears to be the intent. I did not do a thorough analysis of > > > the various parameter limits. > > > > Mathieu has it right. > > > > 32-bit kernels are unaffected by this bug. > > > > RCU_FANOUT_LEAF defaults to 16, which means that a 64-bit kernel would > > need more than 32 leaf rcu_node structures for the parent rcu_node > > structure to need more than 32 bit to track its children. This means > > that more than 32*16=512 CPUs are required for this bug to affect 64-bit > > systems. And there really are systems this big, so I am surprised that > > this has not shown up long ago. But it would not be the first time that > > objective reality surprised me. ;-) > > So with a 64-bit kernel, RCU_FANOUT_LEAF=16, if we have exactly 32 leaf > rcu_node structures (exactly 512 CPUs), we end up in the situation where the > type promotion from signed integer (32-bit) to unsigned long will carry the > sign, and thus create a mask of 0xffffffff80000000. If there is someplace that does a shift of 1UL, yes, this would be a problem. I don't see one, but I might have missed something. If all the shifts are from 1, then the top 33 bits would be always set and cleared as a unit, as if they were one bit. > So if this weird mask is indeed an issue we should state that configurations > _starting from 512 CPUs_ are affected, not just those with more than 512 > CPUs. That would instead be more than 512-16=496 CPUs, correct? 496 CPUs would only require a 31-bit shift, which should be OK, but 497 would require a 32-bit shift, which would result in sign extension. If it turns out that sign extension is OK, then we should get in trouble at 513 CPUs, which would result in a 33-bit shift (and is that even defined in C?). But this can be tested. Let's try setting RCU_FANOUT_LEAF to 2. Then we "only" need 64 (maybe 62) CPUs to trigger the bug. And I happen to have access to a system with 80 CPUs. (Those with smaller systems can try to trigger this bug by changing the current "1 <<" to (say) "4 <<", which should trigger it on 16-CPU systems.) I just fired off an 80-CPU test with CONFIG_RCU_FANOUT_LEAF=2 and will let you know how it goes. Thanx, Paul > Thanks, > > Mathieu > > > > > > Thanx, Paul > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > David > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > > > > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > > > > > > > > > > > Signed-off-by: Denis Arefev <arefev@xxxxxxxxx> > > > > > > --- > > > > > > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by > > > > > > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > > > > > kernel/rcu/srcutree.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > > > > index 20d7a238d675..6c18e6005ae1 100644 > > > > > > --- a/kernel/rcu/srcutree.c > > > > > > +++ b/kernel/rcu/srcutree.c > > > > > > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags) > > > > > > snp->grplo = cpu; > > > > > > snp->grphi = cpu; > > > > > > } > > > > > > - sdp->grpmask = 1 << (cpu - sdp->mynode->grplo); > > > > > > + sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo); > > > > > > } > > > > > > smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER); > > > > > > return true; > > > > > > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp > > > > > > int cpu; > > > > > > > > > > > > for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) { > > > > > > - if (!(mask & (1 << (cpu - snp->grplo)))) > > > > > > + if (!(mask & (1UL << (cpu - snp->grplo)))) > > > > > > continue; > > > > > > srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay); > > > > > > } > > > > > > > > > > -- > > > > > Mathieu Desnoyers > > > > > EfficiOS Inc. > > > > > https://www.efficios.com > > > > > > > > - > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > > > > Registration No: 1397386 (Wales) > > > > > > -- > > > Mathieu Desnoyers > > > EfficiOS Inc. > > > https://www.efficios.com > > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >