On 2024-03-06 22:37, Paul E. McKenney wrote:
On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
[...]
As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern is concerned, the only valid use-case I can think of is split counters or RCU implementations where there is a single updater doing the increment, and one or more concurrent reader threads that need to snapshot a consistent value with READ_ONCE().
[...]
So what would you use that pattern for? One possibility is a per-CPU statistical counter in userspace on a fastpath, in cases where losing the occasional count is OK. Then learning your CPU (and possibly being immediately migrated to some other CPU), READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not) make sense. I suppose the same in the kernel if there was a fastpath so extreme you could not afford to disable preemption. At best, very niche. Or am I suffering a failure of imagination yet again? ;-)
The (niche) use-cases I have in mind are split-counters and RCU grace period tracking, where precise counters totals are needed (no lost count). In the kernel, this could be: - A per-cpu counter, each counter incremented from thread context with preemption disabled (single updater per counter), read concurrently by other threads. WRITE_ONCE/READ_ONCE is useful to make sure there is no store/load tearing there. Atomics on the update would be stronger than necessary on the increment fast-path. - A per-thread counter (e.g. within task_struct), only incremented by the single thread, read by various threads concurrently. - A counter which increment happens to be already protected by a lock, read by various threads without taking the lock. (technically doable, but I'm not sure I see a relevant use-case for it) In user-space: - The "per-cpu" counter would have to use rseq for increments to prevent inopportune migrations, which needs to be implemented in assembler anyway. The counter reads would have to use READ_ONCE(). - The per-thread counter (Thread-Local Storage) incremented by a single thread, read by various threads concurrently, is a good target for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in various liburcu implementations which track read-side critical sections per-thread. - Same as for the kernel, a counter increment protected by a lock which needs to be read from various threads concurrently without taking the lock could be a valid use-case, though I fail to see how it is useful due to lack of imagination on my part. ;-) I'm possibly missing other use-cases, but those come to mind as not involving racy counter increments. I agree that use-cases are so niche that we probably do _not_ want to introduce ADD_SHARED() for that purpose in a common header file, because I suspect that it would then become misused in plenty of scenarios where the updates are actually racy and would be better served by atomics or local-atomics. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com