----- On Apr 7, 2016, at 4:22 PM, Andi Kleen andi@xxxxxxxxxxxxxx wrote: >> One basic use of cpu id cache is to speed up the sched_getcpu(3) >> implementation in glibc. This is why I'm proposing it as a stand-alone > > I don't think rseq is needed for faster getcpu. I agree that rseq is not needed for faster getcpu. This is why I was proposing to make "cpu_id" feature configurable separately from the rseq feature. E.g. a kernel configuration that don't want to take the hit of rseq handling in signal delivery and preemption could just enable the cpu_id feature, and thus only need to add work in the migration code path, and when returning to userspace. Also, if a thread only registers the cpu_id feature, the kernel can skip the rseq code quickly in signal delivery and preemption too. > > User space has to be able handle stale return values anyways, as it > has no way to lock itself to a cpu while it is using the return value. > So it can be only a hint. > > The original version of getcpu just had a jiffies based cache. The CPU > value was valid up to a jiffie (the next time jiffie changes), and then it > gets looked up again. > > Processes are unlikely to switch CPUs more often than a jiffie, so it's > good enough as a hint. One example use-case where this would hurt: we use the CPU id heavily when tracing to a ring buffer in user-space. Having one event written into the wrong buffer once in a while is not a big deal, but tracing a whole burst of events within a jiffy (e.g. 4ms at 250Hz) to the wrong cpu buffer whenever the thread migrates is really an unwanted side-effect latency-wise. > > This doesn't need any new kernel interfaces at all because jiffies is already > exported to the vdso. My understanding is that although your assumptions about availability of those features in vdso are true for x86 32/64, but do not currently apply to ARM32. ARM32 is my main target architecture for the CPU id cache work. x86 32/64 simply also happen to benefit from that work too (see my benchmark numbers in changelog of patch 1/5). > It just needs a new entry point into the vdso that handles the jiffie > check. This would likely require to extend the ARM vdso page to expose the jiffies counter to user-space, and update user-space libraries to use this counter in sched_getcpu. But it would still be slower than the cpu_id cache I propose, due to the required function call to sched_getcpu, unless you want to open-code the jiffies check within all applications as an ABI. It would also be bad for fast bursts of cpu id use (e.g. per-cpu ring buffers). Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html