On Tue, Jul 21, 2015 at 12:58:13PM +0000, Mathieu Desnoyers wrote: > ----- On Jul 21, 2015, at 3:30 AM, Ondřej Bílka neleai@xxxxxxxxx wrote: > > > On Tue, Jul 21, 2015 at 12:25:00AM +0000, Mathieu Desnoyers wrote: > >> >> Does it solve the Wine problem? If Wine uses gs for something and > >> >> calls a function that does this, Wine still goes boom, right? > >> > > >> > So the advantage of just making a global segment descriptor available > >> > is that it's not *that* expensive to just save/restore segments. So > >> > either wine could do it, or any library users would do it. > >> > > >> > But anyway, I'm not sure this is a good idea. The advantage of it is > >> > that the kernel support really is _very_ minimal. > >> > >> Considering that we'd at least also want this feature on ARM and > >> PowerPC 32/64, and that the gs segment selector approach clashes with > >> existing apps (wine), I'm not sure that implementing a gs segment > >> selector based approach to cpu number caching would lead to an overall > >> decrease in complexity if it leads to performance similar to those of > >> portable approaches. > >> > >> I'm perfectly fine with architecture-specific tweaks that lead to > >> fast-path speedups, but if we have to bite the bullet and implement > >> an approach based on TLS and registering a memory area at thread start > >> through a system call on other architectures anyway, it might end up > >> being less complex to add a new system call on x86 too, especially if > >> fast path overhead is similar. > >> > >> But I'm inclined to think that some aspect of the question eludes me, > >> especially given the amount of interest generated by the gs-segment > >> selector approach. What am I missing ? > >> > > As I wrote before you don't have to bite bullet as I said before. It > > suffices to create 128k element array with cpu for each tid, make that > > mmapable file and userspace could get cpu with nearly same performance > > without hacks. > > I don't see how this would be acceptable on memory-constrained embedded > systems. They have multiple cores, and performance requirements, so > having a fast getcpu would be useful there (e.g. telecom industry), > but they clearly cannot afford a 512kB table per process just for that. > Which just means that you need more complicated api and implementation for that but idea stays same. You would need syscalls register/deregister_cpuid_idx that would give you index used instead tid. A kernel would need to handle that many ids could be registered for each thread and resize mmaped file in syscalls. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html