> On Nov 26, 2022, at 1:07 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Sat, Nov 26, 2022 at 10:03:52AM -0800, Paul E. McKenney wrote: >>> On Fri, Nov 25, 2022 at 08:37:03PM +0800, Pingfan Liu wrote: >>> On Thu, Nov 24, 2022 at 08:19:26AM -0800, Paul E. McKenney wrote: >>>> On Wed, Nov 23, 2022 at 09:24:17PM -0800, Paul E. McKenney wrote: >>>>> On Thu, Nov 24, 2022 at 12:06:39PM +0800, Pingfan Liu wrote: >>>>>> On Thu, Nov 24, 2022 at 11:33 AM Pingfan Liu <kernelfans@xxxxxxxxx> wrote: >>>>>>> On Wed, Nov 23, 2022 at 11:20:46AM -0800, Paul E. McKenney wrote: >>>>>>>> On Wed, Nov 23, 2022 at 09:56:38PM +0800, Pingfan Liu wrote: >>>>>>>>> Since the srcu read lock is still held during srcu_funnel_gp_start(), >>>>>>>>> the seq snap should be the largest number for the slot >>>>>>>>> srcu_have_cbs[idx]. >>>>>>>>> >>>>>>>>> Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> >>>>>>>>> Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> >>>>>>>>> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> >>>>>>>>> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> >>>>>>>>> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> >>>>>>>>> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> >>>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >>>>>>>>> To: rcu@xxxxxxxxxxxxxxx >>>>>>>>> --- >>>>>>>>> include/linux/rcupdate.h | 1 + >>>>>>>>> kernel/rcu/srcutree.c | 11 ++++++----- >>>>>>>>> 2 files changed, 7 insertions(+), 5 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h >>>>>>>>> index 08605ce7379d..a09007236660 100644 >>>>>>>>> --- a/include/linux/rcupdate.h >>>>>>>>> +++ b/include/linux/rcupdate.h >>>>>>>>> @@ -32,6 +32,7 @@ >>>>>>>>> #include <linux/context_tracking_irq.h> >>>>>>>>> >>>>>>>>> #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b)) >>>>>>>>> +#define ULONG_CMP_GT(a, b) (ULONG_MAX / 2 > (a) - (b)) >>>>>>>> >>>>>>>> Please see below... >>>>>>>> >>>>>>>>> #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b)) >>>>>>>>> #define ulong2long(a) (*(long *)(&(a))) >>>>>>>>> #define USHORT_CMP_GE(a, b) (USHRT_MAX / 2 >= (unsigned short)((a) - (b))) >>>>>>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c >>>>>>>>> index 2fc0e775ade4..41902b823687 100644 >>>>>>>>> --- a/kernel/rcu/srcutree.c >>>>>>>>> +++ b/kernel/rcu/srcutree.c >>>>>>>>> @@ -905,14 +905,15 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp, >>>>>>>>> for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) { >>>>>>>>> spin_lock_irqsave_rcu_node(snp, flags); >>>>>>>>> snp_seq = snp->srcu_have_cbs[idx]; >>>>>>>>> - if (!srcu_invl_snp_seq(snp_seq) && ULONG_CMP_GE(snp_seq, s)) { >>>>>>>>> + /* >>>>>>>>> + * s should be the biggest in the current slot. Hence only LE is >>>>>>>>> + * valid >>>>>>>>> + */ >>>>>>>>> + BUG_ON(ULONG_CMP_GT(snp_seq, s)); >>>>>>>> >>>>>>>> Why not this? (Plus adjusting the comment above, of course.) >>>>>>>> >>>>>>>> WARN_ON_ONCE(ULONG_CMP_LT(s, snp_seq)); >>>>>>>> >>>>>>> >>>>>>> A neat solution! >>>>>>> >>>>>>> And what about the comment >>>>>>> /* s should be the biggest in the current slot. */ >>>>>>> >>>>>>>> That way we don't need ULONG_CMP_GT(). Plus if we are confused about >>>>>>>> s being the biggest in the current slot, we get a splat and can debug >>>>>>>> further. We both might be quite sure that we are not confused, but >>>>>>>> that is exactly when we are most prone to making mistakes. ;-) >>>>>>>> >>>>>>> >>>>>>> Same feeling :) >>>>>>> >>>>>>>> I also ask that you run with this check for some time. After all, if >>>>>> >>>>>> Forget to ask if the test >>>>>> "tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration >>>>>> 10h --configs 18*SRCU-P" >>>>>> satisfies your requirement? >>>>> >>>>> I am thinking more in terms of adding the WARN_ON_ONCE(), letting it be >>>>> in -next and possibly mainline for a couple of years, and then if there >>>>> are no splats, start feeling more confident in the asserted relationship. >>>>> >>>>> If there was a significant performance, scalability, energy-efficiency, or >>>>> simplification benefit, I would feel justified in being more aggressive. >>>>> >>>>> But I am not seeing a significant benefit. >>>> >>>> Ah, and before I forget -again-, have you thought through the counter-wrap >>>> scenarios? Please keep in mind that on a 32-bit system, those counters >>>> can wrap quite quickly compared to typical uptimes. >>> >>> I think 32-bit has not significant different since the statement >>> WARN_ON_ONCE(ULONG_CMP_LT(s, snp_seq)); >>> >>> where 's < snp_seq' is not associated with bits. Am I missing anything? >> >> Your point is that long before there was any danger of wrap, this >> WARN_ON_ONCE() would trigger? >> >> If so, the point is to avoid that WARN_ON_ONCE() from triggering. >> >> Or am I missing your point? > > For example, are you instead saying that the WARN_ON_ONCE() proposed in > your patch will detect the problem either way? If so, what I am asking > is whether your earlier analysis considered wrap without the addition > of that WARN_ON_ONCE(). Do we want to make some rules on when to delete WARNs and fold in the change, such as a future date after scanning the list for reports? Such date can be embedded in a comment on top of the warning. Thanks, - Joel > Thanx, Paul > >>> Thanks, >>> >>> Pingfan >>> >>>> Thanx, Paul >>>> >>>>>> Thanks >>>>>> >>>>>> Pingfan >>>>>>> Sure. I will try it immediately. >>>>>>> >>>>>>>> the assumption is incorrect, the resulting SRCU hangs off in various >>>>>>>> systems around the world will not be so much fun to debug. >>>>>>>> >>>>>>>> Given that this is slowpath code, it is much better to take an extra >>>>>>>> compare and branch than to introduce even an extremely small risk of >>>>>>>> hanging SRCU. >>>>>>>> >>>>>>> >>>>>>> Agree. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Pingfan >>>>>>> >>>>>>>> Thanx, Paul >>>>>>>> >>>>>>>>> + if (!srcu_invl_snp_seq(snp_seq) && (snp_seq == s)) { >>>>>>>>> if (snp == snp_leaf && snp_seq == s) >>>>>>>>> snp->srcu_data_have_cbs[idx] |= sdp->grpmask; >>>>>>>>> spin_unlock_irqrestore_rcu_node(snp, flags); >>>>>>>>> - if (snp == snp_leaf && snp_seq != s) { >>>>>>>>> - srcu_schedule_cbs_sdp(sdp, do_norm ? SRCU_INTERVAL : 0); >>>>>>>>> - return; >>>>>>>>> - } >>>>>>>>> if (!do_norm) >>>>>>>>> srcu_funnel_exp_start(ssp, snp, s); >>>>>>>>> return; >>>>>>>>> -- >>>>>>>>> 2.31.1 >>>>>>>>>