Re: [PATCH v3] Fix srcu_struct node grpmask overflow on 64-bit systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/5/23 09:38, Paul E. McKenney wrote:
On Tue, Sep 05, 2023 at 08:57:53AM -0400, Mathieu Desnoyers wrote:
On 9/4/23 09:58, Paul E. McKenney wrote:
On Mon, Sep 04, 2023 at 08:58:48AM -0400, Mathieu Desnoyers wrote:
On 9/4/23 08:42, Mathieu Desnoyers wrote:
On 9/4/23 08:21, Denis Arefev wrote:
The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
is subject to overflow due to a failure to cast operands to a larger
data type before performing arithmetic.

The maximum result of this subtraction is defined by the RCU_FANOUT
or other srcu level-spread values assigned by rcu_init_levelspread(),
which can indeed cause the signed 32-bit integer literal ("1") to
overflow
when shifted by any value greater than 31.

We could expand on this:

The maximum result of this subtraction is defined by the RCU_FANOUT
or other srcu level-spread values assigned by rcu_init_levelspread(),
which can indeed cause the signed 32-bit integer literal ("1") to overflow
when shifted by any value greater than 31 on a 64-bit system.

Moreover, when the subtraction value is 31, the 1 << 31 expression results
in 0xffffffff80000000 when the signed integer is promoted to unsigned long
on 64-bit systems due to type promotion rules, which is certainly not the
intended result.

Thank you both!  Could you please also add something to the effect of:
"Given default Kconfig options, this bug affects only systems with more
than 512 CPUs."?

Hi Paul,

I'm trying to understand this "NR_CPUS > 512 CPUs" default Kconfig lower
bound from kernel/rcu/Kconfig and rcu_node_tree.h. Is that on a 32-bit or
64-bit architecture ? Also, I suspect that something like x86-64 MAXSMP (or
an explicit NR_CPUS) needs to be selected over a default Kconfig to support
that many CPUs.

64-bit only.  I believe that 32-bit kernels are unaffected by this bug.

The trick is that RCU reshapes the rcu_node tree in rcu_init_geometry(),
which is invoked during early boot from rcu_init().  This reshaping is
based on nr_cpu_ids.  So if NR_CPUS is (say) 4096, there will be enough
rcu_node structures allocated at build time to accommodate 4096 CPUs
(259 of them, 256 leaf nodes, four internal nodes, and one root node),
but only assuming dense numbering of CPUs.  If rcu_init_geometry() sees
that nr_cpu_ids is (say) 64, it will use only five of them, that is,
four leaf nodes and one root node.  The leaf nodes will need to shift
by at most 16, and the root node by at most 4.

But the possibility of sparse CPU numbering (perhaps to your point)
means that the bug can occur in 64-bit kernels booted on systems with
512 CPUs or fewer if that system has sparse CPU IDs.  For example,
there have been systems that disable all but one hardware thread per
core, but leave places in the CPU numbering for those disabled threads.
Such a system with four hardware threads per core could have a CPU 516
(and thus be affected by this bug) with as few as 129 CPUs.

So a better request would be for something like: "Given default Kconfig
options, this bug affects only 64-bit systems having at least one CPU
for which smp_processor_id() returns 512 or greater."

Does that help, or am I missing your point?

This is a good point, although not the one I was trying to make. See my explanation about impact of having exactly 512 wrt signed integer type promotion in a separate email. So your last phrasing "returns 512 or greater" is better. Previously it appeared that only systems with _more than_ 512 cpus were affected, which was off-by-one considering that systems with exactly 512 cpus are an issue as well.

Thanks,

Mathieu



							Thanx, Paul

Thanks,

Mathieu



							Thanx, Paul

Found by Linux Verification Center (linuxtesting.org) with SVACE.

With the commit message updated with my comment above, please also add:

Fixes: c7e88067c1 ("srcu: Exact tracking of srcu_data structures
containing callbacks")
Cc: <stable@xxxxxxxxxxxxxxx> # v4.11

Sorry, the line above should read:

Cc: <stable@xxxxxxxxxxxxxxx> # v4.11+

Thanks,

Mathieu

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>

Thanks!

Mathieu


Signed-off-by: Denis Arefev <arefev@xxxxxxxxx>
---
v3: Changed the name of the patch, as suggested by
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
    kernel/rcu/srcutree.c | 4 ++--
    1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 20d7a238d675..6c18e6005ae1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct
srcu_struct *ssp, gfp_t gfp_flags)
                    snp->grplo = cpu;
                snp->grphi = cpu;
            }
-        sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+        sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
        }
        smp_store_release(&ssp->srcu_sup->srcu_size_state,
SRCU_SIZE_WAIT_BARRIER);
        return true;
@@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct
srcu_struct *ssp, struct srcu_node *snp
        int cpu;
        for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-        if (!(mask & (1 << (cpu - snp->grplo))))
+        if (!(mask & (1UL << (cpu - snp->grplo))))
                continue;
            srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
        }


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux