On Tue, May 26, 2015 at 2:18 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Tue, May 26, 2015 at 2:04 PM, Mathieu Desnoyers >> >>> >>> It's too bad that not all architectures have a single-instruction >>> unlocked compare-and-exchange. >> >> Based on my benchmarks, it's not clear that single-instruction >> unlocked CAS is actually faster than doing the same with many >> instructions. > > True, but with a single instruction the user can't get preempted in the middle. > > Looking at your code, it looks like percpu_user_sched_in has some > potentially nasty issues with page faults. Avoiding touching user > memory from the scheduler would be quite nice from an implementation > POV, and the x86-specific gs hack wins in that regard. ARM has "TLB lockdown entries" which could, I think, be used to implement per-cpu or per-thread mappings. I'm actually rather surprised that Linux doesn't already use a TLB lockdown entry for TLS. (Hmm. Maybe it's because the interface to write the entries requires actually touching the page. Maybe not -- the ARM docs, in general, seem to be much less clear than the Intel and AMD docs.) ARM doesn't seem to have any single-instruction compare-exchange or similar instruction, though, so this might be all that useful. On the other hand, ARM can probably do reasonably efficient per-cpu memory allocation and such with a single ldrex/strex pair. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html