On Tue, Sep 05, 2023 at 09:43:04AM -0400, Mathieu Desnoyers wrote: > On 9/5/23 09:38, Paul E. McKenney wrote: > > On Tue, Sep 05, 2023 at 08:57:53AM -0400, Mathieu Desnoyers wrote: > > > On 9/4/23 09:58, Paul E. McKenney wrote: > > > > On Mon, Sep 04, 2023 at 08:58:48AM -0400, Mathieu Desnoyers wrote: > > > > > On 9/4/23 08:42, Mathieu Desnoyers wrote: > > > > > > On 9/4/23 08:21, Denis Arefev wrote: > > > > > > > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo) > > > > > > > is subject to overflow due to a failure to cast operands to a larger > > > > > > > data type before performing arithmetic. > > > > > > > > > > > > > > The maximum result of this subtraction is defined by the RCU_FANOUT > > > > > > > or other srcu level-spread values assigned by rcu_init_levelspread(), > > > > > > > which can indeed cause the signed 32-bit integer literal ("1") to > > > > > > > overflow > > > > > > > when shifted by any value greater than 31. > > > > > > > > > > > > We could expand on this: > > > > > > > > > > > > The maximum result of this subtraction is defined by the RCU_FANOUT > > > > > > or other srcu level-spread values assigned by rcu_init_levelspread(), > > > > > > which can indeed cause the signed 32-bit integer literal ("1") to overflow > > > > > > when shifted by any value greater than 31 on a 64-bit system. > > > > > > > > > > > > Moreover, when the subtraction value is 31, the 1 << 31 expression results > > > > > > in 0xffffffff80000000 when the signed integer is promoted to unsigned long > > > > > > on 64-bit systems due to type promotion rules, which is certainly not the > > > > > > intended result. > > > > > > > > Thank you both! Could you please also add something to the effect of: > > > > "Given default Kconfig options, this bug affects only systems with more > > > > than 512 CPUs."? > > > > > > Hi Paul, > > > > > > I'm trying to understand this "NR_CPUS > 512 CPUs" default Kconfig lower > > > bound from kernel/rcu/Kconfig and rcu_node_tree.h. Is that on a 32-bit or > > > 64-bit architecture ? Also, I suspect that something like x86-64 MAXSMP (or > > > an explicit NR_CPUS) needs to be selected over a default Kconfig to support > > > that many CPUs. > > > > 64-bit only. I believe that 32-bit kernels are unaffected by this bug. > > > > The trick is that RCU reshapes the rcu_node tree in rcu_init_geometry(), > > which is invoked during early boot from rcu_init(). This reshaping is > > based on nr_cpu_ids. So if NR_CPUS is (say) 4096, there will be enough > > rcu_node structures allocated at build time to accommodate 4096 CPUs > > (259 of them, 256 leaf nodes, four internal nodes, and one root node), > > but only assuming dense numbering of CPUs. If rcu_init_geometry() sees > > that nr_cpu_ids is (say) 64, it will use only five of them, that is, > > four leaf nodes and one root node. The leaf nodes will need to shift > > by at most 16, and the root node by at most 4. > > > > But the possibility of sparse CPU numbering (perhaps to your point) > > means that the bug can occur in 64-bit kernels booted on systems with > > 512 CPUs or fewer if that system has sparse CPU IDs. For example, > > there have been systems that disable all but one hardware thread per > > core, but leave places in the CPU numbering for those disabled threads. > > Such a system with four hardware threads per core could have a CPU 516 > > (and thus be affected by this bug) with as few as 129 CPUs. > > > > So a better request would be for something like: "Given default Kconfig > > options, this bug affects only 64-bit systems having at least one CPU > > for which smp_processor_id() returns 512 or greater." > > > > Does that help, or am I missing your point? > > This is a good point, although not the one I was trying to make. See my > explanation about impact of having exactly 512 wrt signed integer type > promotion in a separate email. So your last phrasing "returns 512 or > greater" is better. Previously it appeared that only systems with _more > than_ 512 cpus were affected, which was off-by-one considering that systems > with exactly 512 cpus are an issue as well. If 512 CPUs is an issue, then so also is 497 CPUs. Both result in a 32-bit shift. If having the upper 33 bits act as a unit is OK (and I *think* that it is), then you need that 513th CPU (or, better, a CPU whose smp_processor_id() return value is at least 513) to make something bad happen. I would also be OK with noting that with 497 or more CPUs, strange things start happening. My intent was definitely that there only be a single bit set in sdp->grpmask, after all! The fact that it might (or might not) happen to work notwithstanding. ;-) Thanx, Paul > Thanks, > > Mathieu > > > > > > Thanx, Paul > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > Found by Linux Verification Center (linuxtesting.org) with SVACE. > > > > > > > > > > > > With the commit message updated with my comment above, please also add: > > > > > > > > > > > > Fixes: c7e88067c1 ("srcu: Exact tracking of srcu_data structures > > > > > > containing callbacks") > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # v4.11 > > > > > > > > > > Sorry, the line above should read: > > > > > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # v4.11+ > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > > > Signed-off-by: Denis Arefev <arefev@xxxxxxxxx> > > > > > > > --- > > > > > > > v3: Changed the name of the patch, as suggested by > > > > > > > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > > > > > > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by > > > > > > > Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > > > > > > kernel/rcu/srcutree.c | 4 ++-- > > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > > > > > index 20d7a238d675..6c18e6005ae1 100644 > > > > > > > --- a/kernel/rcu/srcutree.c > > > > > > > +++ b/kernel/rcu/srcutree.c > > > > > > > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct > > > > > > > srcu_struct *ssp, gfp_t gfp_flags) > > > > > > > snp->grplo = cpu; > > > > > > > snp->grphi = cpu; > > > > > > > } > > > > > > > - sdp->grpmask = 1 << (cpu - sdp->mynode->grplo); > > > > > > > + sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo); > > > > > > > } > > > > > > > smp_store_release(&ssp->srcu_sup->srcu_size_state, > > > > > > > SRCU_SIZE_WAIT_BARRIER); > > > > > > > return true; > > > > > > > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct > > > > > > > srcu_struct *ssp, struct srcu_node *snp > > > > > > > int cpu; > > > > > > > for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) { > > > > > > > - if (!(mask & (1 << (cpu - snp->grplo)))) > > > > > > > + if (!(mask & (1UL << (cpu - snp->grplo)))) > > > > > > > continue; > > > > > > > srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay); > > > > > > > } > > > > > > > > > > > > > > > > -- > > > > > Mathieu Desnoyers > > > > > EfficiOS Inc. > > > > > https://www.efficios.com > > > > > > > > > > > -- > > > Mathieu Desnoyers > > > EfficiOS Inc. > > > https://www.efficios.com > > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >