----- On Mar 28, 2018, at 11:22 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > On Tue, Mar 27, 2018 at 12:05:31PM -0400, Mathieu Desnoyers wrote: > >> 1) Allow algorithms to perform per-cpu data migration without relying on >> sched_setaffinity() >> >> The use-cases are migrating memory between per-cpu memory free-lists, or >> stealing tasks from other per-cpu work queues: each require that >> accesses to remote per-cpu data structures are performed. > > I think that one completely reduces to the per-cpu (spin)lock case, > right? Because, as per the below, your logging case (8) can 'easily' be > done without the cpu_opv monstrosity. > > And if you can construct a per-cpu lock, that can be used to construct > aribtrary logic. The per-cpu spinlock does not have the same performance characteristics as lock-free alternatives for various operations. A rseq compare-and-store is faster than a rseq spinlock for linked-list operations. > > And the difficult case for the per-cpu lock is the remote acquire; all > the other cases are (relatively) trivial. > > I've not really managed to get anything sensible to work, I've tried > several variations of split lock, but you invariably end up with > barriers in the fast (local) path, which sucks. > > But I feel this should be solvable without cpu_opv. As in, I really hate > that thing ;-) I have not developed cpu_opv out of any kind of love for that solution. I just realized that it did solve all my issues after failing for quite some time to implement acceptable solutions for the remote access problem, and for ensuring progress of single-stepping with current debuggers that don't know about the rseq_table section. > >> 8) Allow libraries with multi-part algorithms to work on same per-cpu >> data without affecting the allowed cpu mask >> >> The lttng-ust tracer presents an interesting use-case for per-cpu >> buffers: the algorithm needs to update a "reserve" counter, serialize >> data into the buffer, and then update a "commit" counter _on the same >> per-cpu buffer_. Using rseq for both reserve and commit can bring >> significant performance benefits. >> >> Clearly, if rseq reserve fails, the algorithm can retry on a different >> per-cpu buffer. However, it's not that easy for the commit. It needs to >> be performed on the same per-cpu buffer as the reserve. >> >> The cpu_opv system call solves that problem by receiving the cpu number >> on which the operation needs to be performed as argument. It can push >> the task to the right CPU if needed, and perform the operations there >> with preemption disabled. >> >> Changing the allowed cpu mask for the current thread is not an >> acceptable alternative for a tracing library, because the application >> being traced does not expect that mask to be changed by libraries. > > We talked about this use-case, and it can be solved without cpu_opv if > you keep a dual commit counter, one local and one (atomic) remote. Right. > > We retain the cpu_id from the first rseq, and the second part will, when > it (unlikely) finds it runs remotely, do an atomic increment on the > remote counter. The consumer of the counter will then have to sum both > the local and remote counter parts. Yes, I did a prototype of this specific case with split-counters a while ago. However, if we need cpu_opv as fallback for other reasons (e.g. remote accesses), then the split-counters are not needed, and there is no need to change the layout of user-space data to accommodate the extra per-cpu counter. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html