>> I don't see why we can't stick this directly into struct rseq because >> it's all public anyway. > > The motivation for moving this to a different cache line is to handle > the prior comment from Boqun, who is concerned that busy-waiting > repeatedly loading a field from struct rseq will cause false-sharing and > make other stores to that cache line slower, especially stores to > rseq_cs to begin rseq critical sections, thus slightly increasing the > overhead of rseq critical sections taken while mutexes are held. > > If we want to embed this field into struct rseq with its own cache line, > then we need to add a lot of padding, which is inconvenient. > > That being said, perhaps this is premature optimization, what do you think ? Hi Mathieu, Florian, This is exciting! I thought the motivation for moving rseq_sched_state out of struct rseq is lifetime management problem. I assume when a thread locks a mutex, it stores pointer to rseq_sched_state in the mutex state for other threads to poll. So the waiting thread would do something along the following lines: rseq_sched_state* state = __atomic_load_n(mutex->sched_state, __ATOMIC_RELAXED); if (state && !(state->state & RSEQ_SCHED_STATE_FLAG_ON_CPU)) futex_wait(); Now if the state is struct rseq, which is stored in TLS, then the owning thread can unlock the mutex, exit and unmap TLS in between. Consequently, load of state->state will cause a paging fault. And we do want rseq in TLS to save 1 indirection. If rseq_sched_state is separated from struct rseq, then it can be allocated in type stable memory that is never unmapped. What am I missing here? However, if we can store this state in struct rseq, then an alternative interface would for the kernel to do: rseq->cpu_id = -1; to denote that the thread is not running on any CPU. I think it kinda makes sense, rseq->cpu_id is the thread's current CPU, and -1 naturally means "not running at all". And we already store -1 right after init, so it shouldn't be a surprising value.