On Jul 13, 2015 9:27 AM, "Mathieu Desnoyers" <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > ----- On Jul 12, 2015, at 11:38 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote: > > > On Jul 12, 2015 12:06 PM, "Mathieu Desnoyers" > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > >> > >> Expose a new system call allowing threads to register a userspace memory > >> area where to store the current CPU number. Scheduler migration sets the > >> TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, > >> a notify-resume handler updates the current CPU value within that > >> user-space memory area. > >> > >> This getcpu cache is an alternative to the sched_getcpu() vdso which has > >> a few benefits: > >> - It is faster to do a memory read that to call a vDSO, > >> - This cache value can be read from within an inline assembly, which > >> makes it a useful building block for restartable sequences. > >> > > > > Let's wait and see what the final percpu atomic solution is. If it > > involves percpu segments, then this is unnecessary. > > percpu segments will likely not solve everything. I have a use-case > with dynamically allocated per-cpu ring buffer in user-space (lttng-ust) > which can be a challenge for percpu segments. Having a fast getcpu() > is a win in those cases. > Even so, percpu segments will give you fast getcpu without introducing a new scheduler hook. > > > > Also, this will need to be rebased onto -tip, and that should wait > > until the big exit rewrite is farther along. > > I don't really care which thread flag it ends up using, and this is > more or less an internal implementation detail. The important part is > the ABI exposed to user-space, and it's good to start the discussion > on this aspect early. > Agreed. > > > >> This approach is inspired by Paul Turner and Andrew Hunter's work > >> on percpu atomics, which lets the kernel handle restart of critical > >> sections: > >> Ref.: > >> * https://lkml.org/lkml/2015/6/24/665 > >> * https://lwn.net/Articles/650333/ > >> * > >> http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf > >> > >> Benchmarking sched_getcpu() vs tls cache approach. Getting the > >> current CPU number: > >> > >> - With Linux vdso: 12.7 ns > > > > This is a bit unfair, because the glibc wrapper sucks and the > > __vdso_getcpu interface is overcomplicated. We can fix it with a > > better API. It won't make it *that* much faster, though. > > Even if we improve the vDSO function, we are at a point where just > the function call is not that cheap. > True, and the LSL isn't likely to go away. The branches can go, though. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html