----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@xxxxxx wrote: >> One idea I have would be to let the kernel reserve some space either after the >> first stack address (for a stack growing down) or at the beginning of the >> allocated TLS area for each thread in copy_thread_tls() by fiddling with >> sp or the tls base address when creating a thread. > > Could this be implemented by having glibc use a well known symbol name to define > the per-thread TLS area? If an high performance application wants to avoid any > relocations in accessing this variable it would define it and that definition > would override glibc's. This is how things work with malloc. glibc has a > default malloc implementation but we link jemalloc directly into our binaries. > in addition to changing the malloc implementation this means that calls to > malloc don't go through the PLT. Just to make sure I understand your proposal: defining a well known symbol with a weak attribute in glibc (or bionic...), e.g.: int32_t __thread __attribute__((weak)) __getcpu_cache; so that applications which care about bypassing the PLT can override it with: int32_t __thread __getcpu_cache; glibc/bionic would be responsible for calling the getcpu_cache() system call to register/unregister this TLS variable for each thread. One thing I would like to figure out is whether we can use this in a way that would allow introducing getcpu_cache() into applications and libraries (e.g. lttng-ust tracer) before it gets implemented into glibc, in a way that would keep forward compatibility for whenever it gets introduced in glibc. We can declare __getcpu_cache as a weak symbol in arbitrary libraries, and make them register/unregister the cache through the getcpu_cache syscall. The main thing that I would need to tweak at the kernel level within the system call would be to keep a refcount of the number of times the __getcpu_cache is registered per thread. This would allow multiple registrations, one per library (e.g. lttng-ust) and one for glibc, but we would validate that they all register the exact same address for a given thread. The reference counting trick should also work for cases where applications define a non-weak __getcpu_cache, and want to call the getcpu_cache system call to register it themselves (before glibc adds support for it). Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html