On January 12, 2016 4:22:29 PM PST, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@xxxxxx wrote: > >>> One idea I have would be to let the kernel reserve some space either >after the >>> first stack address (for a stack growing down) or at the beginning >of the >>> allocated TLS area for each thread in copy_thread_tls() by fiddling >with >>> sp or the tls base address when creating a thread. >> >> Could this be implemented by having glibc use a well known symbol >name to define >> the per-thread TLS area? If an high performance application wants to >avoid any >> relocations in accessing this variable it would define it and that >definition >> would override glibc's. This is how things work with malloc. glibc >has a >> default malloc implementation but we link jemalloc directly into our >binaries. >> in addition to changing the malloc implementation this means that >calls to >> malloc don't go through the PLT. > >Just to make sure I understand your proposal: defining a well known >symbol >with a weak attribute in glibc (or bionic...), e.g.: > >int32_t __thread __attribute__((weak)) __getcpu_cache; > >so that applications which care about bypassing the PLT can override it >with: > >int32_t __thread __getcpu_cache; > >glibc/bionic would be responsible for calling the getcpu_cache() system >call >to register/unregister this TLS variable for each thread. > >One thing I would like to figure out is whether we can use this in a >way that >would allow introducing getcpu_cache() into applications and libraries >(e.g. lttng-ust tracer) before it gets implemented into glibc, in a way >that >would keep forward compatibility for whenever it gets introduced in >glibc. > >We can declare __getcpu_cache as a weak symbol in arbitrary libraries, >and >make them register/unregister the cache through the getcpu_cache >syscall. >The main thing that I would need to tweak at the kernel level within >the >system call would be to keep a refcount of the number of times the >__getcpu_cache is registered per thread. This would allow multiple >registrations, >one per library (e.g. lttng-ust) and one for glibc, but we would >validate >that they all register the exact same address for a given thread. > >The reference counting trick should also work for cases where >applications >define a non-weak __getcpu_cache, and want to call the getcpu_cache >system call to register it themselves (before glibc adds support for >it). This seems like something better done in a tiny common library, rather than the kernel or by playing symbol resolution games. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html