Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Mon, 4 Mar 2024 12:47:48 -0800

On Mon, Mar 04, 2024 at 03:13:10PM -0500, Joel Fernandes wrote:
> 
> 
> On 3/4/2024 2:44 PM, Paul E. McKenney wrote:
> > On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
> >>
> >>
> >> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
> >>> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
> >>>>
> >>>>
> >>>> On 3/4/2024 5:54 AM, linke li wrote:
> >>>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
> >>>>>
> >>>>>  {
> >>>>>  	int i;
> >>>>>
> >>>>> -	i = rp->rtort_pipe_count;
> >>>>> +	i = READ_ONCE(rp->rtort_pipe_count);
> >>>>>  	if (i > RCU_TORTURE_PIPE_LEN)
> >>>>>  		i = RCU_TORTURE_PIPE_LEN;
> >>>>>  	atomic_inc(&rcu_torture_wcount[i]);
> >>>>> -	if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>> +	WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>>>> +	if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>>  		rp->rtort_mbtest = 0;
> >>>>>  		return true;
> >>>>>  	}
> >>>>>
> >>>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
> >>>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
> >>>>> rcu_torture_writer() concurrently.
> >>>>>
> >>>>> Also, rp->rtort_pipe_count in the next line should be read using
> >>>>> READ_ONCE() because of data race.
> >>>>>
> >>>>> Signed-off-by: linke li <lilinke99@xxxxxx>
> >>>>> ---
> >>>>>  kernel/rcu/rcutorture.c | 4 ++--
> >>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> >>>>> index 7567ca8e743c..00059ace4fd5 100644
> >>>>> --- a/kernel/rcu/rcutorture.c
> >>>>> +++ b/kernel/rcu/rcutorture.c
> >>>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> >>>>>  	if (i > RCU_TORTURE_PIPE_LEN)
> >>>>>  		i = RCU_TORTURE_PIPE_LEN;
> >>>>>  	atomic_inc(&rcu_torture_wcount[i]);
> >>>>> -	WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>>>> -	if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>> +	WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>>>> +	if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
> >>>>
> >>>> I want to say, I am not convinced with the patch because what's wrong with
> >>>> writing to an old index?
> >>>>
> >>>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
> >>>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
> >>>> the "incremented i" or "the increment from the latest value of pipe_count".
> >>>>
> >>>> Anyway, a slightly related/different question:
> >>>>
> >>>> Should that:
> >>>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>>>
> >>>> Be:
> >>>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
> >>>>
> >>>> ?
> >>>
> >>> Thank you both!
> >>>
> >>> At first glance, I would argue for something like this:
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> static bool
> >>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> >>> {
> >>> 	int i;
> >>> 	struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> >>>
> >>> 	if (rtrcp) {
> >>> 		WRITE_ONCE(rp->rtort_chkp, NULL);
> >>> 		smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> >>> 	}
> >>> 	i = READ_ONCE(rp->rtort_pipe_count) + 1;
> >>> 	if (i > RCU_TORTURE_PIPE_LEN)
> >>> 		i = RCU_TORTURE_PIPE_LEN;
> >>> 	atomic_inc(&rcu_torture_wcount[i]);
> >>> 	WRITE_ONCE(rp->rtort_pipe_count, i);
> >>> 	if (i >= RCU_TORTURE_PIPE_LEN) {
> >>> 		rp->rtort_mbtest = 0;
> >>> 		return true;
> >>> 	}
> >>> 	return false;
> >>> }
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> That is, move the increment to the read and replace the re-read with
> >>> the value "i" that was just written.
> >>
> >> But that changes the original logic as well? It looks like with the above
> >> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
> >> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
> >>
> >> I think that might break rcutorture, because there is an increment outside of
> >> the first 2 entries in rcu_torture_wcount but not sure (need to look more).
> > 
> > Good point on never incrementing the zeroth entry!  Clearly I should
> > have waited before replying.
> > 
> > How about the following?
> > 
> > ------------------------------------------------------------------------
> > 
> > static bool
> > rcu_torture_pipe_update_one(struct rcu_torture *rp)
> > {
> > 	int i;
> > 	struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> > 
> > 	if (rtrcp) {
> > 		WRITE_ONCE(rp->rtort_chkp, NULL);
> > 		smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> > 	}
> > 	i = READ_ONCE(rp->rtort_pipe_count);
> > 	if (i > RCU_TORTURE_PIPE_LEN)
> > 		i = RCU_TORTURE_PIPE_LEN;
> > 	atomic_inc(&rcu_torture_wcount[i]);
> > 	WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> > 	if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
> > 		rp->rtort_mbtest = 0;
> > 		return true;
> > 	}
> > 	return false;
> > }
> 
> Yes, this looks good to me. Thanks,
> Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

Again, thank you.

linke li, does this approach work for you?  If so, would you be willing to
send a new patch along these lines?  If it does not work, what additional
problems do you see?

							Thanx, Paul