----- On Apr 7, 2016, at 8:25 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote: > On Thu, Apr 07, 2016 at 02:03:53PM +0200, Florian Weimer wrote: >> > struct tlabi { >> > union { >> > __u8[64] __foo; >> > struct { >> > /* fields go here */ >> > }; >> > }; >> > } __aligned__(64); >> >> That's not really “fixed size” as far as an ABI is concerned, due to the >> possibility of future extensions. > > sizeof(struct tlabi) is always the same, right? How is that not fixed? > >> > People objected against the fixed size scheme, but it being possible to >> > get a fixed TCB offset and reduce indirections is a big win IMO. >> >> It's a difficult trade-off. It's not an indirection as such, it's avoid >> loading the dynamic TLS offset. > > What we _want_ is being able to use %[gf]s:offset and have it work (I > forever forget which segment register userspace TLS uses). > >> Let me repeat that the ELF TLS GNU ABI has very limited support for >> static offsets at present, and it is difficult to make them available >> more widely without code generation at run time (in the form of text >> relocations, but still). > > Do you have a pointer to something I can read? Because I'm clearly not > understanding the full issue here. For what is is worth, here are a couple of objdump snippet of my test program without and with -fPIC: * Compiled with -O2, *without* -fPIC, x86-64: __thread __attribute__((weak)) volatile struct thread_local_abi __thread_local_abi; static int32_t read_cpu_id(void) { if (unlikely(!(__thread_local_abi.features & TLABI_FEATURE_CPU_ID))) 40064e: 64 8b 04 25 c0 ff ff mov %fs:0xffffffffffffffc0,%eax 400655: ff 400656: a8 01 test $0x1,%al 400658: 74 71 je 4006cb <main+0xab> return sched_getcpu(); return __thread_local_abi.cpu_id; 40065a: 64 8b 14 25 c4 ff ff mov %fs:0xffffffffffffffc4,%edx 400661: ff } * Compiled with -O2, with -fPIC, x86_64: __thread __attribute__((weak)) volatile struct thread_local_abi __thread_local_abi; 4006de: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax 4006e5: 00 00 static int32_t read_cpu_id(void) { if (unlikely(!(__thread_local_abi.features & TLABI_FEATURE_CPU_ID))) 4006e7: 48 8d 80 c0 ff ff ff lea -0x40(%rax),%rax 4006ee: 8b 10 mov (%rax),%edx 4006f0: 83 e2 01 and $0x1,%edx 4006f3: 0f 84 80 00 00 00 je 400779 <main+0xc9> return sched_getcpu(); return __thread_local_abi.cpu_id; 4006f9: 8b 50 04 mov 0x4(%rax),%edx } So with -fPIC (libraries), TLS adds an extra indirection. However, it just needs to load the base address once, and can then access both "features" and "cpu_id" fields as offsets from that base. For executables compiled without -fPIC, there is no indirection. This justifies the following paragraph in the proposed man page: The symbol __thread_local_abi is recommended to be used across libraries and applications wishing to register a the thread-local ABI structure for tlabi_nr 0. The attribute "weak" is recommended when declaring this variable in libraries. Applications can choose to define their own version of this symbol without the weak attribute as a performance improvement. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html