Re: [PATCHv2 2/2] srcu: Eliminate the case that snp_seq bigger than snap in srcu_funnel_gp_start()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Nov 26, 2022, at 1:07 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> 
> On Sat, Nov 26, 2022 at 10:03:52AM -0800, Paul E. McKenney wrote:
>>> On Fri, Nov 25, 2022 at 08:37:03PM +0800, Pingfan Liu wrote:
>>> On Thu, Nov 24, 2022 at 08:19:26AM -0800, Paul E. McKenney wrote:
>>>> On Wed, Nov 23, 2022 at 09:24:17PM -0800, Paul E. McKenney wrote:
>>>>> On Thu, Nov 24, 2022 at 12:06:39PM +0800, Pingfan Liu wrote:
>>>>>> On Thu, Nov 24, 2022 at 11:33 AM Pingfan Liu <kernelfans@xxxxxxxxx> wrote:
>>>>>>> On Wed, Nov 23, 2022 at 11:20:46AM -0800, Paul E. McKenney wrote:
>>>>>>>> On Wed, Nov 23, 2022 at 09:56:38PM +0800, Pingfan Liu wrote:
>>>>>>>>> Since the srcu read lock is still held during srcu_funnel_gp_start(),
>>>>>>>>> the seq snap should be the largest number for the slot
>>>>>>>>> srcu_have_cbs[idx].
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx>
>>>>>>>>> Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
>>>>>>>>> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
>>>>>>>>> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
>>>>>>>>> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
>>>>>>>>> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>>>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
>>>>>>>>> To: rcu@xxxxxxxxxxxxxxx
>>>>>>>>> ---
>>>>>>>>> include/linux/rcupdate.h |  1 +
>>>>>>>>> kernel/rcu/srcutree.c    | 11 ++++++-----
>>>>>>>>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>>>>>>>> index 08605ce7379d..a09007236660 100644
>>>>>>>>> --- a/include/linux/rcupdate.h
>>>>>>>>> +++ b/include/linux/rcupdate.h
>>>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>>> #include <linux/context_tracking_irq.h>
>>>>>>>>> 
>>>>>>>>> #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
>>>>>>>>> +#define ULONG_CMP_GT(a, b) (ULONG_MAX / 2 > (a) - (b))
>>>>>>>> 
>>>>>>>> Please see below...
>>>>>>>> 
>>>>>>>>> #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
>>>>>>>>> #define ulong2long(a)              (*(long *)(&(a)))
>>>>>>>>> #define USHORT_CMP_GE(a, b)        (USHRT_MAX / 2 >= (unsigned short)((a) - (b)))
>>>>>>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>>>>>>>>> index 2fc0e775ade4..41902b823687 100644
>>>>>>>>> --- a/kernel/rcu/srcutree.c
>>>>>>>>> +++ b/kernel/rcu/srcutree.c
>>>>>>>>> @@ -905,14 +905,15 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
>>>>>>>>>            for (snp = snp_leaf; snp != NULL; snp = snp->srcu_parent) {
>>>>>>>>>                    spin_lock_irqsave_rcu_node(snp, flags);
>>>>>>>>>                    snp_seq = snp->srcu_have_cbs[idx];
>>>>>>>>> -                   if (!srcu_invl_snp_seq(snp_seq) && ULONG_CMP_GE(snp_seq, s)) {
>>>>>>>>> +                   /*
>>>>>>>>> +                    * s should be the biggest in the current slot. Hence only LE is
>>>>>>>>> +                    * valid
>>>>>>>>> +                    */
>>>>>>>>> +                   BUG_ON(ULONG_CMP_GT(snp_seq, s));
>>>>>>>> 
>>>>>>>> Why not this?  (Plus adjusting the comment above, of course.)
>>>>>>>> 
>>>>>>>>                      WARN_ON_ONCE(ULONG_CMP_LT(s, snp_seq));
>>>>>>>> 
>>>>>>> 
>>>>>>> A neat solution!
>>>>>>> 
>>>>>>> And what about the comment
>>>>>>>                        /* s should be the biggest in the current slot. */
>>>>>>> 
>>>>>>>> That way we don't need ULONG_CMP_GT().  Plus if we are confused about
>>>>>>>> s being the biggest in the current slot, we get a splat and can debug
>>>>>>>> further.  We both might be quite sure that we are not confused, but
>>>>>>>> that is exactly when we are most prone to making mistakes.  ;-)
>>>>>>>> 
>>>>>>> 
>>>>>>> Same feeling :)
>>>>>>> 
>>>>>>>> I also ask that you run with this check for some time.  After all, if
>>>>>> 
>>>>>> Forget to ask if  the test
>>>>>>  "tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
>>>>>> 10h --configs 18*SRCU-P"
>>>>>> satisfies your requirement?
>>>>> 
>>>>> I am thinking more in terms of adding the WARN_ON_ONCE(), letting it be
>>>>> in -next and possibly mainline for a couple of years, and then if there
>>>>> are no splats, start feeling more confident in the asserted relationship.
>>>>> 
>>>>> If there was a significant performance, scalability, energy-efficiency, or
>>>>> simplification benefit, I would feel justified in being more aggressive.
>>>>> 
>>>>> But I am not seeing a significant benefit.
>>>> 
>>>> Ah, and before I forget -again-, have you thought through the counter-wrap
>>>> scenarios?  Please keep in mind that on a 32-bit system, those counters
>>>> can wrap quite quickly compared to typical uptimes.
>>> 
>>> I think 32-bit has not significant different since the statement
>>>  WARN_ON_ONCE(ULONG_CMP_LT(s, snp_seq));
>>> 
>>> where 's < snp_seq' is not associated with bits. Am I missing anything?
>> 
>> Your point is that long before there was any danger of wrap, this
>> WARN_ON_ONCE() would trigger?
>> 
>> If so, the point is to avoid that WARN_ON_ONCE() from triggering.
>> 
>> Or am I missing your point?
> 
> For example, are you instead saying that the WARN_ON_ONCE() proposed in
> your patch will detect the problem either way?  If so, what I am asking
> is whether your earlier analysis considered wrap without the addition
> of that WARN_ON_ONCE().

Do we want to make some rules on when to delete WARNs and fold in the change, such as a future date after scanning the list for reports? Such date can be embedded in a comment on top of the warning.

Thanks,

    - Joel

>                            Thanx, Paul
> 
>>> Thanks,
>>> 
>>>    Pingfan
>>> 
>>>>                            Thanx, Paul
>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>    Pingfan
>>>>>>> Sure. I will try it immediately.
>>>>>>> 
>>>>>>>> the assumption is incorrect, the resulting SRCU hangs off in various
>>>>>>>> systems around the world will not be so much fun to debug.
>>>>>>>> 
>>>>>>>> Given that this is slowpath code, it is much better to take an extra
>>>>>>>> compare and branch than to introduce even an extremely small risk of
>>>>>>>> hanging SRCU.
>>>>>>>> 
>>>>>>> 
>>>>>>> Agree.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>>        Pingfan
>>>>>>> 
>>>>>>>>                                                      Thanx, Paul
>>>>>>>> 
>>>>>>>>> +                   if (!srcu_invl_snp_seq(snp_seq) && (snp_seq == s)) {
>>>>>>>>>                            if (snp == snp_leaf && snp_seq == s)
>>>>>>>>>                                    snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
>>>>>>>>>                            spin_unlock_irqrestore_rcu_node(snp, flags);
>>>>>>>>> -                           if (snp == snp_leaf && snp_seq != s) {
>>>>>>>>> -                                   srcu_schedule_cbs_sdp(sdp, do_norm ? SRCU_INTERVAL : 0);
>>>>>>>>> -                                   return;
>>>>>>>>> -                           }
>>>>>>>>>                            if (!do_norm)
>>>>>>>>>                                    srcu_funnel_exp_start(ssp, snp, s);
>>>>>>>>>                            return;
>>>>>>>>> --
>>>>>>>>> 2.31.1
>>>>>>>>> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux