Re: Draft TLS/NPTL ABI for m68k and ColdFire, version 0.2

Roman Zippel <zippel@xxxxxxxxxxxxxx> · Mon, 3 Dec 2007 18:40:27 +0100 (CET)

Hi,

On Fri, 30 Nov 2007, Joseph S. Myers wrote:

Kernel helpers
--------------

This TLS ABI defines a function __m68k_read_tp, provided by libc.
This returns the thread pointer in register a0 (not d0) and may
clobber other call-clobbered registers.  The compiler will generate
calls to this function for the initial exec and local exec models.

To implement this function and other requirements for NPTL, four
kernel helpers are to be provided in a vDSO (as provided by the kernel
on Power and other architectures).  The symbols indicated are exported
at symbol version LINUX_2.6.  Full DWARF unwind information for all
these functions must be included in the vDSO, as thread cancellation
may need to unwind from any point in any of these functions.  The
kernel informs glibc of the location of the vDSO by putting an
AT_SYSINFO_EHDR entry in the auxiliary vector passed to each process.
If glibc is configured for a subset of processors where the necessary
operations do not require a kernel helper, then it does not need to
use the kernel helper (for example, glibc configured only for m68k
processors with a cas instruction does not need to use the
compare-and-exchange helper), but the kernel must provide all these
helpers on all m68k and ColdFire processors so that
lowest-common-denominator glibc binaries can work across all
processors.

The helper __kernel_read_tp returns the thread pointer in register a0
(not d0) and may clobber other call-clobbered registers.  (Because it
is only called from __m68k_read_tp, which is called through the PLT,
and the resolver may clobber call-clobbered registers, there seems to
be no advantage in restricting clobbers from this helper.)

Why is there a need for separate __kernel_read_tp/__m68k_read_tp? Wouldn't 
this add one unneccessary indirection? Couldn't one of them just be an 
alias for the other?
Personally I'd call them ..._get_tp/_set_tp (i.e. closer to what ARM is 
using).

The helper __kernel_write_tp sets the thread pointer to the value in
a0.  It does not clobber any registers other than the condition codes.

This function is not really critical, so I'd keep clobber rules in line 
with above.

Offset length issues
--------------------

On ColdFire (and m68k before 68020), only 16-bit offsets can be used
in memory addresses.  On m68k (68020 and later), 32-bit offsets can be
used; a ".w" assembly suffix is used for 16-bit offsets, and otherwise
the offsets are 32 bits.

The use of 16-bit offsets limits GOT size to 8192 entries (the
toolchain does not use negative GOT offsets on m68k/ColdFire).  On
m68k (68020 and later), GCC uses 32-offsets with -fPIC and 16-bit
offsets with -fpic (and does not need to use GOT accesses for non-PIC
code at present).

The proposals here do not address GOT size limitations, although an
example is given to illustrate a possible longer access sequence to
avoid those limitations on ColdFire.  The examples using offsets such
as #x@TLSGD in GOT accesses are shown for ColdFire and use the 16-bit
relocations shown.  For m68k (68020 and later), either the syntax
shown may be used, with a 32-bit relocation, or a ".w" suffix may be
used, with a 16-bit relocation.  It is proposed that the compiler, on
m68k (68020 and later), will use ".w" for -fpic and the 32-bit offsets
otherwise.  (No specific option is proposed to choose between 16-bit
and 32-bit offsets for the non-PIC, initial exec case, though such an
option could be added later.)

The same issue as for GOT accesses also applies to accesses to TLS
data using the local dynamic and local exec models.  The example code
sequences determine the address of the variable, but typically it will
be desired to read or write the variable and this may be done more
efficiently using offset addressing.  It is proposed that by default
the compiler will require the relevant TLS area to be accessible using
16-bit offsets, and that an option -mxtls must be used when compiling
objects that use the local dynamic or local exec models and will be
linked into a module with too large a TLS area for 16-bit offset
addressing.

Trying to use 16bit offset has advantages for m68k too, as the extra 16bit 
makes the instruction by 32bit larger.
However I don't have a good feeling at forcing a specific model at the 
ABI level, I'd rather leave the default to the system environment and 
create two options to specifically select the model (e.g. FRV has 
-mtls/mTLS).

Otherwise the rest looks good, details probably have to be dealt with 
during implementation anyway.
I've already played with a vdso implementation and played with a few 
possibilities, there are subtle problems when writing to that page (e.g. 
by the debugger via ptrace), so that at the next context switch the 
correct thread value is written to the correct page...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-m68k" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html