Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 7 Mar 2024 11:47:35 -0800

On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote:
> On 2024-03-06 22:37, Paul E. McKenney wrote:
> > On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
> [...]
> > 
> > > As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
> > > is concerned, the only valid use-case I can think of is
> > > split counters or RCU implementations where there is a
> > > single updater doing the increment, and one or more
> > > concurrent reader threads that need to snapshot a
> > > consistent value with READ_ONCE().
> > 
> [...]
> > 
> > So what would you use that pattern for?
> > 
> > One possibility is a per-CPU statistical counter in userspace on a
> > fastpath, in cases where losing the occasional count is OK.  Then learning
> > your CPU (and possibly being immediately migrated to some other CPU),
> > READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> > make sense.
> > 
> > I suppose the same in the kernel if there was a fastpath so extreme you
> > could not afford to disable preemption.
> > 
> > At best, very niche.
> > 
> > Or am I suffering a failure of imagination yet again?  ;-)
> 
> The (niche) use-cases I have in mind are split-counters and RCU
> grace period tracking, where precise counters totals are needed
> (no lost count).
> 
> In the kernel, this could be:

Thank you for looking into this!

> - A per-cpu counter, each counter incremented from thread context with
>   preemption disabled (single updater per counter), read concurrently by
>   other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
>   is no store/load tearing there. Atomics on the update would be stronger
>   than necessary on the increment fast-path.

But if preemption is disabled, the updater can read the value without
READ_ONCE() without risk of concurrent update.  Or are you concerned about
interrupt handlers?  This would have to be a read from the interrupt
handler, given that an updated from the interrupt handler could result
in a lost count.

> - A per-thread counter (e.g. within task_struct), only incremented by the
>   single thread, read by various threads concurrently.

Ditto.

> - A counter which increment happens to be already protected by a lock, read
>   by various threads without taking the lock. (technically doable, but
>   I'm not sure I see a relevant use-case for it)

In that case, the lock would exclude concurrent updates, so the lock
holder would not need READ_ONCE(), correct?

> In user-space:
> 
> - The "per-cpu" counter would have to use rseq for increments to prevent
>   inopportune migrations, which needs to be implemented in assembler anyway.
>   The counter reads would have to use READ_ONCE().

Fair enough!

> - The per-thread counter (Thread-Local Storage) incremented by a single
>   thread, read by various threads concurrently, is a good target
>   for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
>   various liburcu implementations which track read-side critical sections
>   per-thread.

Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
similar?

> - Same as for the kernel, a counter increment protected by a lock which
>   needs to be read from various threads concurrently without taking
>   the lock could be a valid use-case, though I fail to see how it is
>   useful due to lack of imagination on my part. ;-)

In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though
here we have the WRITE_ONCE() but not the READ_ONCE() because we hold
the lock excluding any other updates.

> I'm possibly missing other use-cases, but those come to mind as not
> involving racy counter increments.
> 
> I agree that use-cases are so niche that we probably do _not_ want to
> introduce ADD_SHARED() for that purpose in a common header file,
> because I suspect that it would then become misused in plenty of
> scenarios where the updates are actually racy and would be better
> served by atomics or local-atomics.

I agree that unless or until we have a reasonable number of use cases, we
should open-code it.  With a big fat comment explaining how it works.  ;-)

							Thanx, Paul