On Wed, Nov 23, 2022 at 09:56:38PM +0800, Pingfan Liu wrote: > Since the srcu read lock is still held during srcu_funnel_gp_start(), > the seq snap should be the largest number for the slot > srcu_have_cbs[idx]. > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > To: rcu@xxxxxxxxxxxxxxx > --- > include/linux/rcupdate.h | 1 + > kernel/rcu/srcutree.c | 11 ++++++----- > 2 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 08605ce7379d..a09007236660 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -32,6 +32,7 @@ > #include <linux/context_tracking_irq.h> > > #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b)) > +#define ULONG_CMP_GT(a, b) (ULONG_MAX / 2 > (a) - (b)) Please see below... > #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b)) > #define ulong2long(a) (*(long *)(&(a))) > #define USHORT_CMP_GE(a, b) (USHRT_MAX / 2 >= (unsigned short)((a) - (b))) > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index 2fc0e775ade4..41902b823687 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -905,14 +905,15 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp, > for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) { > spin_lock_irqsave_rcu_node(snp, flags); > snp_seq = snp->srcu_have_cbs[idx]; > - if (!srcu_invl_snp_seq(snp_seq) && ULONG_CMP_GE(snp_seq, s)) { > + /* > + * s should be the biggest in the current slot. Hence only LE is > + * valid > + */ > + BUG_ON(ULONG_CMP_GT(snp_seq, s)); Why not this? (Plus adjusting the comment above, of course.) WARN_ON_ONCE(ULONG_CMP_LT(s, snp_seq)); That way we don't need ULONG_CMP_GT(). Plus if we are confused about s being the biggest in the current slot, we get a splat and can debug further. We both might be quite sure that we are not confused, but that is exactly when we are most prone to making mistakes. ;-) I also ask that you run with this check for some time. After all, if the assumption is incorrect, the resulting SRCU hangs off in various systems around the world will not be so much fun to debug. Given that this is slowpath code, it is much better to take an extra compare and branch than to introduce even an extremely small risk of hanging SRCU. Thanx, Paul > + if (!srcu_invl_snp_seq(snp_seq) && (snp_seq == s)) { > if (snp == snp_leaf && snp_seq == s) > snp->srcu_data_have_cbs[idx] |= sdp->grpmask; > spin_unlock_irqrestore_rcu_node(snp, flags); > - if (snp == snp_leaf && snp_seq != s) { > - srcu_schedule_cbs_sdp(sdp, do_norm ? SRCU_INTERVAL : 0); > - return; > - } > if (!do_norm) > srcu_funnel_exp_start(ssp, snp, s); > return; > -- > 2.31.1 >