Re: [PATCH v2] The value may overflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/5/23 15:27, Paul E. McKenney wrote:
On Tue, Sep 05, 2023 at 09:40:51AM -0700, Paul E. McKenney wrote:
On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
On 9/5/23 10:15, David Laight wrote:
...
That would instead be more than 512-16=496 CPUs, correct?  496 CPUs would
only require a 31-bit shift, which should be OK, but 497 would require
a 32-bit shift, which would result in sign extension.  If it turns out
that sign extension is OK, then we should get in trouble at 513 CPUs,
which would result in a 33-bit shift (and is that even defined in C?).

Not quite right :-)

(1 << 31) is int and negative, that gets sign extended before
being converted to 'unsigned long' - so has the top 33 bits set.

Good point, thank you for the correction.

(1 << 32) is undefined, the current x86 cpu ignore the high
shift bits so it is (1 << 0).

Which would cause it to attempt to invoke SRCU callbacks on the
lowest-numbered CPU associated with that srcu_node structure.

Yes, I was about to reply the same thing. A shift of 31 is buggy,
because shifting 1 << 31 raises the sign bit, which sets the top 33
bits when cast to unsigned long. A shift of 1 << 32 is undefined,
with for instance x86 choosing to ignore the top bit.

But in order to have a 1 << 31 shift from this expression:

                 sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);

I *think* we need the group to have 32 cpus or more (indexed within
the group from grplo to grplo + 31 (both inclusive)).

So as soon as we have one group with 32 cpus, the problem should show
up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
group of 32 cpus, for:

   15*31 + 32 = 497 cpus.

AFAIU, this would be the minimum value for smp_processor_id()+1 which
triggers this issue.

By default, there are 16 CPUs per leaf srcu_node structure.  Each leaf
srcu_node structure takes up one bit in the parent srcu_node structures'
srcu_data_have_cbs[] array.  Up to 30 bits is OK, 31 bits is questionable,
and 32 bits and larger erroneous.

This is the situation as I see it (assuming dense CPU numbering):

	# Leaf srcu_node		Largest
	structures	#CPUs		CPU #		Status

	0-30		0-480		479		Good
	31		481-496		495		Questionable
	32-		497-		496+		Bad.

Tree SRCU differs from Tree RCU in its operation, so this bug should
not hold up SRCU grace periods.  It might instead cause SRCU callbacks
to be ignored (which would admittedly look quite similar to the user).

My attempts to cheat the numbering system ran up against the limited
height of the srcu_node tree.

But there is still the question of why this has not been seen in the
wild, given that there really are systems with more than 479 CPUs
out there.  One possibility is the call to srcu_schedule_cbs_sdp()
from srcu_funnel_gp_start().  But it does not seem likely that this
would happen all that often, as it requires back-to-back grace periods
and then some.

Maybe people with large systems boot with srcutree.convert_to_big=0?

Adding Laurent for his thoughts.

Aha!

Here is what makes it work, given David's description of 1<<32:

static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
				  unsigned long mask, unsigned long delay)
{
	int cpu;

	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
		if (!(mask & (1 << (cpu - snp->grplo))))
			continue;
		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
	}
}

As long as at least one bit is set in the result of 1<<N, and as long
as the compiler always does the same thing for a given N, then this loop
will make the right thing happen.

But even that is not relied upon, because the calling code looks like
this:

			spin_lock_irq_rcu_node(snp);
			cbs = false;
			last_lvl = snp >= sup->level[rcu_num_lvls - 1];
			if (last_lvl)
				cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
			snp->srcu_have_cbs[idx] = gpseq;
			rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
			sgsne = snp->srcu_gp_seq_needed_exp;
			if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
				WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
			if (ss_state < SRCU_SIZE_BIG)
				mask = ~0;
			else
				mask = snp->srcu_data_have_cbs[idx];
			snp->srcu_data_have_cbs[idx] = 0;
			spin_unlock_irq_rcu_node(snp);
			if (cbs)
				srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);

So that last_lvl check means that the srcu_schedule_cbs_snp() function
is invoked only for leaf srcu_node structures.  Which by default limit
the shift to 16.

So this bug appears to have no effect for default RCU setups, even on very
large 64-bit systems, which is consistent with field experience.  Even if
this is the case, it still should be fixed, to avoid confusion if nothing
else.  Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
Which actually happened the other day due to someone trusting ChatGPT's
opinion about RCU Kconfig options...

And I therefore queued Denis's v3 patch with an edited commit log.
Of course, if anyone sees some other way that the bug could manifest
other than in a 64-bit kernel built with CONFIG_RCU_FANOUT_LEAF greater
than 30 on a system with at least 31 CPUs, please let me know so that
I can adjust.

							Thanx, Paul

------------------------------------------------------------------------

commit ed083b0e22f1396dee3599896249a3f218845298
Author: Denis Arefev <arefev@xxxxxxxxx>
Date:   Mon Sep 4 15:21:14 2023 +0300

     Fix srcu_struct node grpmask overflow on 64-bit systems
The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)

AFAIU, the overflow resides in the "bitwise expression" and not
the arithmetic expression.

Other than this, please add my

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>


     is subject to overflow due to a failure to cast operands to a larger
     data type before performing arithmetic.
The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
     Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
     maximum shift of 15), but which can be set up as high as 64 (resulting
     in a maximum shift of 63).  A value of 31 can result in sign extension,
     resulting in 0xffffffff80000000 instead of the desired 0x80000000.
     A value of 31 or greater triggers undefined behavior per the C standard.
This bug has not been known to cause issues because almost all kernels
     take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as
     a given compiler gives a deterministic result for 1<<N for N>=32, the
     code correctly invokes all SRCU callbacks, albeit wasting CPU time along
     the way.
This commit therefore substitutes the correct 1UL for the buggy 1. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Denis Arefev <arefev@xxxxxxxxx>
     Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
     Cc: David Laight <David.Laight@xxxxxxxxxx>
     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 833a8f848a90..5602042856b1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
  				snp->grplo = cpu;
  			snp->grphi = cpu;
  		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
  	}
  	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
  	return true;
@@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
  	int cpu;
for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
  			continue;
  		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
  	}

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux