On Thu, Feb 25, 2016 at 05:17:51PM +0000, Mathieu Desnoyers wrote: > ----- On Feb 25, 2016, at 12:04 PM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > > > On Thu, Feb 25, 2016 at 04:55:26PM +0000, Mathieu Desnoyers wrote: > >> ----- On Feb 25, 2016, at 4:56 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > >> The restartable sequences are intrinsically designed to work > >> on per-cpu data, so they need to fetch the current CPU number > >> within the rseq critical section. This is where the getcpu_cache > >> system call becomes very useful when combined with rseq: > >> getcpu_cache allows reading the current CPU number in a > >> fraction of cycle. > > > > Yes yes, I know how restartable sequences work. > > > > But what I worry about is that they want a cpu number and a sequence > > number, and for performance it would be very good if those live in the > > same cacheline. > > > > That means either getcpu needs to grow a seq number, or restartable > > sequences need to _also_ provide the cpu number. > > If we plan things well, we could have both the cpu number and the > seqnum in the same cache line, registered by two different system > calls. It's up to user-space to organize those two variables > to fit within the same cache-line. I feel this is more fragile than needed. Why not do a single systemcall that does both? > getcpu_cache GETCPU_CACHE_SET operation takes the address where > the CPU number should live as input. > > rseq system call could do the same for the seqnum address. So I really don't like that, that means we have to track more kernel state -- we have to carry two pointers instead of one, we have to have more update functions etc.. That just increases the total overhead of all of this. > The question becomes: how do we introduce this to user-space, > considering that only a single address per thread is allowed > for each of getcpu_cache and rseq ? > > If both CPU number and seqnum are centralized in a TLS within > e.g. glibc, that would be OK, but if we intend to allow libraries > or applications to directly register their own getcpu_cache > address and/or rseq, we may end up in situations where we have > to fallback on using two different cache-lines. But how much > should we care about performance in cases where non-generic > libraries directly use those system calls ? > > Thoughts ? Yeah, not sure, but that is a separate problem. Both your proposed code and the rseq code have this. Having them separate system calls just increases the amount of ways you can do it wrong. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html