On 9/28/23 16:54, Thomas Gleixner wrote:
On Mon, May 29 2023 at 15:14, Mathieu Desnoyers wrote:
+/*
+ * rseq_sched_state should be aligned on the cache line size.
+ */
+struct rseq_sched_state {
+ /*
+ * Version of this structure. Populated by the kernel, read by
+ * user-space.
+ */
+ __u32 version;
+ /*
+ * The state is updated by the kernel. Read by user-space with
+ * single-copy atomicity semantics. This field can be read by any
+ * userspace thread. Aligned on 32-bit. Contains a bitmask of enum
+ * rseq_sched_state_flags. This field is provided as a hint by the
+ * scheduler, and requires that the page holding this state is
+ * faulted-in for the state update to be performed by the scheduler.
+ */
+ __u32 state;
+ /*
+ * Thread ID associated with the thread registering this structure.
+ * Initialized by user-space before registration.
+ */
+ __u32 tid;
+};
+
/*
* struct rseq is aligned on 4 * 8 bytes to ensure it is always
* contained within a single cache-line.
@@ -148,6 +180,15 @@ struct rseq {
*/
__u32 mm_cid;
+ __u32 padding1;
+
+ /*
+ * Restartable sequences sched_state_ptr field. Initialized by
+ * userspace to the address at which the struct rseq_sched_state is
+ * located. Read by the kernel on rseq registration.
+ */
+ __u64 sched_state_ptr;
+
Why is this a separate entity instead of being simply embedded into
struct rseq?
Neither the code comment nor the changelog tells anything about that.
Here is the email thread from v1 that led to this:
https://lore.kernel.org/lkml/ZGevZxOjJLMO9zlM@boqun-archlinux/
The reason for moving this sched_state to its own structure was to
optimize for a scenario where we have:
- many threads contending for the lock, thus loading the value of the
struct rseq "sched_state".
- the lock owner thread doing rseq critical sections with the lock held,
thus updating the value of struct rseq "rseq_cs" field (in the same
cache line).
The loads of the sched_state from other threads would slow down (even
slightly) the update to the rseq_cs field, thus causing the critical
sections to take a few more cycles.
I am not convinced that the complexity vs speed tradeoff of moving this
into its own separate structure is worth it though. Especially given
that it would require userspace to wire things up with an additional
side-structure, rather than just extend naturally with the
extensible-rseq ABI. Another approach would be to cacheline-align this
field within struct rseq and waste space to padding.
I am inclined to just embed this into struct rseq without caring too
much about slight overhead caused by sharing the cache line with other
fields.
If your intention was to provide a solution for process shared futexes
then you completely failed to understand how process shared futexes
work. If not, then please explain why you need a pointer and the
associated hackery to deal with it.
I have not given a single thought to shared futexes in this PoC so far. :)
So let's see: we could do the following to support shared futexes: move
the information about the owner's struct rseq (could be the thread
pointer of the owner thread) to a field separate from the "lock" state
(which is ABI as you pointed out). Therefore, grabbing the lock would be
done with an atomic instruction, and then setting this extra information
about lock owner would be done with a simple store.
Given that the lock owner information is only used to statistically
decide whether other threads should spin or block when contended, we
don't really care if this information is set a few instructions after
acquiring the lock.
Thoughts ?
Thanks for the feedback,
Mathieu
Thanks,
tglx
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com