CodeSourcery has been investigating implementing TLS (Thread-Local Storage) and NPTL (Native POSIX Thread Library) for ColdFire processors. The proposed TLS ABI for ColdFire and m68k, including the required kernel interfaces, is below; any comments? We do not at present have a timescale for the implementation to be available. Toolchain patches will probably be contributed to the respective development mainlines in the usual order (first binutils, then GCC, then glibc). ColdFire and m68k TLS and NPTL ABI draft version 0.2 ==================================================== For background reading on TLS, see Ulrich Drepper's document <http://people.redhat.com/drepper/tls.pdf>. Design choices -------------- * There are no spare registers available to designate as the thread register. Therefore, kernel magic is needed to obtain the thread pointer from userspace. Kernel helpers are provided in a vDSO since they will need unwind information associated; see details below. Compiler-generated code will use an ABI-defined function __m68k_read_tp with that function handling the details of calling the vDSO. * Use TLS variant I (TLS_DTV_AT_TP in glibc terms), where the TLS data goes after the TCB. * The thread pointer points to 0x7000 (the value of TLS_TCB_OFFSET in glibc) after the start of the TLS data areas, as on Power and MIPS. This makes a greater amount of the data accessible with signed 16-bit offsets from the thread pointer than with an unbiased pointer. (0x7000 is used instead of 0x8000 so that the TCB can also be accessed with 16-bit offsets from the thread pointer.) * The DTP for a module points to 0x8000 (the value of TLS_DTV_OFFSET in glibc) after the start of the TLS data for that module, as on Power and MIPS. * There are no linker optimizations to convert one TLS model into another; as such, the compiler can rearrange and optimize the instruction sequences shown. The relocations can be applied to extension words in many different instructions. * The __tls_get_addr function is typedef struct { unsigned long int ti_module; unsigned long int ti_offset; } tls_index; extern void *__tls_get_addr (tls_index *ti); * All the static relocations for offsets from GOT, DTP or TP are defined in 8-bit, 16-bit and 32-bit forms, similarly to existing m68k/ColdFire relocations. Both the 16-bit and 32-bit forms are likely to be of use in compiler-generated code. Kernel helpers -------------- This TLS ABI defines a function __m68k_read_tp, provided by libc. This returns the thread pointer in register a0 (not d0) and may clobber other call-clobbered registers. The compiler will generate calls to this function for the initial exec and local exec models. To implement this function and other requirements for NPTL, four kernel helpers are to be provided in a vDSO (as provided by the kernel on Power and other architectures). The symbols indicated are exported at symbol version LINUX_2.6. Full DWARF unwind information for all these functions must be included in the vDSO, as thread cancellation may need to unwind from any point in any of these functions. The kernel informs glibc of the location of the vDSO by putting an AT_SYSINFO_EHDR entry in the auxiliary vector passed to each process. If glibc is configured for a subset of processors where the necessary operations do not require a kernel helper, then it does not need to use the kernel helper (for example, glibc configured only for m68k processors with a cas instruction does not need to use the compare-and-exchange helper), but the kernel must provide all these helpers on all m68k and ColdFire processors so that lowest-common-denominator glibc binaries can work across all processors. The helper __kernel_read_tp returns the thread pointer in register a0 (not d0) and may clobber other call-clobbered registers. (Because it is only called from __m68k_read_tp, which is called through the PLT, and the resolver may clobber call-clobbered registers, there seems to be no advantage in restricting clobbers from this helper.) Beyond the helper required for TLS, three further kernel helpers are proposed for NPTL implementation: one to provide an atomic compare-and-exchange operation (not available directly in the ColdFire instruction set), one to provide a memory barrier (which can just return to the user for non-SMP) and one to set the thread pointer. The helper __kernel_atomic_cmpxchg_32 compares the 32-bit value at the location pointed to by a0 with the value in d0. If the values are equal, it writes the value in d1 to the location pointed to by a0; otherwise, it writes the value at the location pointed to by a0 to d0. It does not clobber any registers other than the condition codes (and the modification of d0 indicated so that d0 is returned with the original value of the memory location in all cases). (On m68k - where this kernel helper would only be used if glibc is built for the intersection of ColdFire and m68k - this could be implemented with a single cas instruction and a return.) The helper __kernel_atomic_barrier provides a memory barrier. It does not clobber any registers other than the condition codes. On non-SMP, it can just return to the user; on SMP it needs to ensure memory synchronization between processors. The helper __kernel_write_tp sets the thread pointer to the value in a0. It does not clobber any registers other than the condition codes. Offset length issues -------------------- On ColdFire (and m68k before 68020), only 16-bit offsets can be used in memory addresses. On m68k (68020 and later), 32-bit offsets can be used; a ".w" assembly suffix is used for 16-bit offsets, and otherwise the offsets are 32 bits. The use of 16-bit offsets limits GOT size to 8192 entries (the toolchain does not use negative GOT offsets on m68k/ColdFire). On m68k (68020 and later), GCC uses 32-offsets with -fPIC and 16-bit offsets with -fpic (and does not need to use GOT accesses for non-PIC code at present). The proposals here do not address GOT size limitations, although an example is given to illustrate a possible longer access sequence to avoid those limitations on ColdFire. The examples using offsets such as #x@TLSGD in GOT accesses are shown for ColdFire and use the 16-bit relocations shown. For m68k (68020 and later), either the syntax shown may be used, with a 32-bit relocation, or a ".w" suffix may be used, with a 16-bit relocation. It is proposed that the compiler, on m68k (68020 and later), will use ".w" for -fpic and the 32-bit offsets otherwise. (No specific option is proposed to choose between 16-bit and 32-bit offsets for the non-PIC, initial exec case, though such an option could be added later.) The same issue as for GOT accesses also applies to accesses to TLS data using the local dynamic and local exec models. The example code sequences determine the address of the variable, but typically it will be desired to read or write the variable and this may be done more efficiently using offset addressing. It is proposed that by default the compiler will require the relevant TLS area to be accessible using 16-bit offsets, and that an option -mxtls must be used when compiling objects that use the local dynamic or local exec models and will be linked into a module with too large a TLS area for 16-bit offset addressing. Conventions ----------- In the instruction sequences shown below, a5 is used to refer to the GOT pointer (which must already have been loaded). Apart from the ABI-defined registers used for thread-pointer return (a0) and __tls_get_addr return (d0), other registers may be used where convenient. The relocations shown on instructions are to be understood to be applied to the extension word or words of those instructions. Code sequences are shown in the form: instruction relocation against variable General Dynamic TLS model ------------------------- Code sequence: pea #x@TLSGD(%a5) R_68K_TLS_GD16 x jbsr __tls_get_addr Outstanding relocations: GOT[n] R_68K_TLS_DTPMOD32 x GOT[n+1] R_68K_TLS_DTPREL32 x The R_68K_TLS_GD16 relocation causes the static linker to allocate two consecutive GOT entries for a tls_index structure and apply the indicated relocations to them. The dynamic linker fills in those entries at runtime. The code sequence leaves the address of x in d0. On ColdFire, the example code sequence is limited to a 16-bit GOT offset, as discussed above. If a larger GOT is required on ColdFire, a longer instruction sequence must be used; for example: move.l %a5,%a0 add.l #x@TLSGD,%a0 R_68K_TLS_GD32 x pea (%a0) jbsr __tls_get_addr Local Dynamic TLS model ----------------------- Code sequence: pea #x@TLSLDM(%a5) R_68K_TLS_LDM16 x jbsr __tls_get_addr ... move.l %d0,%a1 add.l #x1@TLSLDO,%a1 R_68K_TLS_LDO32 x1 Outstanding relocations: GOT[n] R_68K_TLS_DTPMOD32 x The R_68K_TLS_LDM16 relocation causes the static linker to allocate two consecutive GOT entries for a tls_index structure and apply the indicated relocation to the first; the second has a value of 0 and no relocation. The dynamic linker fills in those entries at runtime. The first part of the code sequence leaves the address of the TLS block for the current module (biased by 0x8000 as discussed above) in %d0. The second part of the code sequence determines the address of x1 based on the address of the TLS block; the static linker resolves R_68K_TLS_LDO32 to the correct offset from the (biased) DTP value. Other code sequences may be used to access the value of x1 rather than computing its address, possibly with R_68K_TLS_LDO16 relocations depending on whether the size of the TLS area for this module is known to be at most 64k. Note that the local dynamic model is generally only beneficial if a function is accessing more than one TLS variable with this model and so can reuse the TLS block address. The same comments about GOT size apply as for the general dynamic model. Initial Exec TLS model ---------------------- Code sequence: jbsr __m68k_read_tp ... move.l #x@TLSIE(%a5),%a1 R_68K_TLS_IE16 x add.l %a0,%a1 Outstanding relocations (apart from those associated with calling __m68k_read_tp through the PLT): GOT[n] R_68K_TLS_TPREL32 x The jbsr instruction loads the thread pointer into a0. This may be reused for each variable accessed with this model. Each R_68K_TLS_IE16 relocation causes the allocation of a single GOT entry with the indicated relocation; this GOT entry is set up by the dynamic linker with the offset for that TLS variable relative to the (biased) thread pointer. The second part of the code sequence loads this offset from the GOT and adds the thread pointer to put the address of x in a1. The same comments about GOT size apply as for the general dynamic and local dynamic models. Local Exec TLS model -------------------- Code sequence: jbsr __m68k_read_tp ... move.l %a0,%a1 add.l #x@TLSLE,%a1 R_68K_TLS_LE32 x No outstanding relocations (apart from those associated with calling __m68k_read_tp through the PLT). The jbsr instruction loads the thread pointer into a0. This may be reused for each variable accessed with this model or the initial exec model. The R_68K_TLS_LE32 relocation is resolved by the static linker to the offset of x relative to the (biased) thread pointer. The second part of the code sequence puts the address of x in a1. Other code sequences may be used to access the value of x rather than computing its address, possibly with R_68K_TLS_LE16 relocations depending on whether all of the TLS area for the executable is known to be within 32k of the thread pointer. Debug information ----------------- DWARF-2 sequence: DW_OP_addr .word #x@TLSLDO+0x8000 R_68K_TLS_LDO32 x DW_OP_GNU_push_tls_address No outstanding relocations. The static linker resolves the relocation and offset to put the unbiased address of x relative to the TLS block for its module in the word of debug information. GDB then uses this to locate the variable at debug time. ELF relocations --------------- Static relocations: #define R_68K_TLS_GD32 25 #define R_68K_TLS_GD16 26 #define R_68K_TLS_GD8 27 #define R_68K_TLS_LDM32 28 #define R_68K_TLS_LDM16 29 #define R_68K_TLS_LDM8 30 #define R_68K_TLS_LDO32 31 #define R_68K_TLS_LDO16 32 #define R_68K_TLS_LDO8 33 #define R_68K_TLS_IE32 34 #define R_68K_TLS_IE16 35 #define R_68K_TLS_IE8 36 #define R_68K_TLS_LE32 37 #define R_68K_TLS_LE16 38 #define R_68K_TLS_LE8 39 Dynamic relocations: #define R_68K_TLS_DTPMOD32 40 #define R_68K_TLS_DTPREL32 41 #define R_68K_TLS_TPREL32 42 -- Joseph S. Myers joseph@xxxxxxxxxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe linux-m68k" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html