On Thu, Mar 07, 2024 at 02:53:43PM -0500, Mathieu Desnoyers wrote: > On 2024-03-07 14:47, Paul E. McKenney wrote: > > On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote: > > > On 2024-03-06 22:37, Paul E. McKenney wrote: > > > > On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote: > > > [...] > > > > > > > > > As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern > > > > > is concerned, the only valid use-case I can think of is > > > > > split counters or RCU implementations where there is a > > > > > single updater doing the increment, and one or more > > > > > concurrent reader threads that need to snapshot a > > > > > consistent value with READ_ONCE(). > > > > > > > [...] > > > > > > > > So what would you use that pattern for? > > > > > > > > One possibility is a per-CPU statistical counter in userspace on a > > > > fastpath, in cases where losing the occasional count is OK. Then learning > > > > your CPU (and possibly being immediately migrated to some other CPU), > > > > READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not) > > > > make sense. > > > > > > > > I suppose the same in the kernel if there was a fastpath so extreme you > > > > could not afford to disable preemption. > > > > > > > > At best, very niche. > > > > > > > > Or am I suffering a failure of imagination yet again? ;-) > > > > > > The (niche) use-cases I have in mind are split-counters and RCU > > > grace period tracking, where precise counters totals are needed > > > (no lost count). > > > > > > In the kernel, this could be: > > > > Thank you for looking into this! > > > > > - A per-cpu counter, each counter incremented from thread context with > > > preemption disabled (single updater per counter), read concurrently by > > > other threads. WRITE_ONCE/READ_ONCE is useful to make sure there > > > is no store/load tearing there. Atomics on the update would be stronger > > > than necessary on the increment fast-path. > > > > But if preemption is disabled, the updater can read the value without > > READ_ONCE() without risk of concurrent update. Or are you concerned about > > interrupt handlers? This would have to be a read from the interrupt > > handler, given that an updated from the interrupt handler could result > > in a lost count. > > You are correct that the updater don't need READ_ONCE there. It would > however require a WRITE_ONCE() to match READ_ONCE() from concurrent > reader threads. > > > > > > - A per-thread counter (e.g. within task_struct), only incremented by the > > > single thread, read by various threads concurrently. > > > > Ditto. > > Right, only WRITE_ONCE() on the single updater, READ_ONCE() on readers. > > > > > > - A counter which increment happens to be already protected by a lock, read > > > by various threads without taking the lock. (technically doable, but > > > I'm not sure I see a relevant use-case for it) > > > > In that case, the lock would exclude concurrent updates, so the lock > > holder would not need READ_ONCE(), correct? > > Correct. > > > > > > In user-space: > > > > > > - The "per-cpu" counter would have to use rseq for increments to prevent > > > inopportune migrations, which needs to be implemented in assembler anyway. > > > The counter reads would have to use READ_ONCE(). > > > > Fair enough! > > > > > - The per-thread counter (Thread-Local Storage) incremented by a single > > > thread, read by various threads concurrently, is a good target > > > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in > > > various liburcu implementations which track read-side critical sections > > > per-thread. > > > > Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or > > similar? > > Not quite, I recall it's more like WRITE_ONCE(x, READ_ONCE(y)) or such, > so we can grab the value of the current gp counter and store it into a > TLS variable. Good point, you could use that pattern to grab a shared snapshot. Still sounds niche, but you never know! ;-) Thanx, Paul > > > - Same as for the kernel, a counter increment protected by a lock which > > > needs to be read from various threads concurrently without taking > > > the lock could be a valid use-case, though I fail to see how it is > > > useful due to lack of imagination on my part. ;-) > > > > In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though > > here we have the WRITE_ONCE() but not the READ_ONCE() because we hold > > the lock excluding any other updates. > > You are right, the READ_ONCE() is not needed in this case for the > updater, only for the concurrent readers. > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >