[Sorry for the 3-way crosspost!] One of the big holes in the MIPS ABI has always been the lack of support for non-PIC executables. Any call that might be to a DSO must be made indirectly via $25, and any data that might be defined in a DSO must be accessed via the GOT. MIPS has no PLTs or copy relocations. There has been talk of changing this at various times over the years. In true bus style, nothing much happened for a long time, then two implementations came along at once. I implemented non-PIC support for Specifix, as part of a more general project to allow MIPS16 code to be used on GNU/Linux. At the same time, CodeSourcery implemented it for Sourcery G++. I only found out about CS's version recently, after finishing the Specifix one, and I think the same is true in reverse. Oh well! I suppose the good news is that we can pick the best bits of each implementation as the official one. I'll describe my implementation below, then compare it to what I understand CS's version to be. CS folks: please correct me if I'm wrong. Dan said that he'd be submitting CS's version too. First of all, I should emphasise that this is intended to be a pure ABI extension. Existing objects should continue to work, and existing ET_REL objects should be link-compatible with the new objects. Example ------- To take a concrete example, suppose an executable has code like this: extern void foo (void); extern int x; void bar (void) { foo (x); } The compiler has no information about where "foo" and "x" are defined, so for safety's sake, it must assume that they might be defined in a shared library. We are currently forced to generate code like this: bar: .set noreorder lui $28,%hi(__gnu_local_gp) addiu $28,$28,%lo(__gnu_local_gp) lw $25,%call16(foo)($gp) jr $25 lw $4,%got(x)($gp) .end noreorder (This is the "-mno-shared" form. The "-mshared" version would replace the first two instructions with ".cpload $25" and expect $25 to be valid on entry.) This is very inefficient if "x" and "foo" turn out to be defined in the executable itself. In contrast, the non-PIC implementation of "bar" is simply: bar: .set noreorder lui $4,%hi(x) j bar lw $4,%lo(x)($4) .end noreorder So the aim is to allow this non-PIC version of "bar" to be used in dynamic executables. It is the static linker's responsibility to ensure that "bar" works when: - "x" is defined by a shared library - "foo" is a PIC function in the same executable - "foo" is defined by a shared library It needs help from the dynamic linker to handle the first and third cases. Copy relocations ---------------- As on most other SVR4 targets, we want to use dynamic relocations for full-word references like: .data .word x However, we want to use copy relocations if the reference is in a read-only section, such as: .set mips16 lw $2,1f jr $31 .align 2 1: .word x or if the reference is not a full-word one: lui $2,%hi(x) addiu $2,$2,%lo(x) Fortunately, VxWorks has already allocated an R_MIPS_COPY relocation type. We can simply extend it to GNU objects as well. PLTs ---- MIPS has traditionally not used PLTs. Instead, it has a special form of lazy binding stub that is local to the object; unlike a PLT entry, this stub does not participate in name lookup. These stubs can only be used when all references are through R_MIPS*_CALL* relocations. The GOT slot starts off pointing at the stub, then the stub redirects it to the real function. References like: .word f prevent lazy binding. In contrast, PLTs would allow function references of the forms: j f jal f lui $2,%hi(f) addiu $2,$2,%lo(f) They would also allow references of the form: .word f to be lazily bound. Adding PLTs gives us three possible ways of referring to an externally-defined function: - a full-word dynamic relocation (R_MIPS_REL32, possibly combined with R_MIPS_64) These relocations can only be used if all references are 32-bit ones. They prevent lazy binding, so we only use them for traditional PIC objects. - a traditional MIPS lazy-binding stub These stubs can only be used if all references to the function are through R_MIPS*_CALL* relocations. However, they are the most efficient way of handling that situation. Once the function has been resolved, all calls go directly to the real target. - a PLT stub PLTs are the fallback, and provide a second, more general, form of lazy binding. As before, we can appropriate the VxWorks R_MIPS_JUMP_SLOT relocation and use it for GNU objects too. Many (but not all) targets put .got.plt in the main .got section. I don't think it makes sense to do this for MIPS. We never use $gp-relative accesses for .got.plt, so putting it in .got would steal valuable room in the primary GOT. DT_JMPREL, DT_PLTREL and DT_PLTSZ describe .rel.plt in the usual way. Objects without PLTs do not have these tags. At the moment, there are two sorts of GOT header: - When the top bit of _GLOBAL_OFFSET_TABLE_[1] is clear: - _GLOBAL_OFFET_TABLE_[0] points to the resolver for the traditional lazy-binding stubs. - _GLOBAL_OFFSET_TABLE_[1] is the first local or global GOT entry. This is the traditional SVR4 GOT. glibc still supports it. - When the top bit of _GLOBAL_OFFSET_TABLE_[1] is set - _GLOBAL_OFFET_TABLE_[0] points to the resolver for the traditional lazy-binding stubs. - _GLOBAL_OFFSET_TABLE_[1] is a module pointer. - _GLOBAL_OFFSET_TABLE_[2] is the first local or global GOT entry. This is a GNU extension that the linker has used for a long time. glibc keeps the top bit of _GLOBAL_OFFSET_TABLE_[1] set, but uClibc does not. We need a further GOT entry for resolving PLTs, so the obvious thing is to reserve _GLOBAL_OFFSET_TABLE_[2]. There are then three GOT layouts: - When the top bit of _GLOBAL_OFFSET_TABLE_[1] is clear: Layout as before. - When the top bit of _GLOBAL_OFFSET_TABLE_[1] is set and there is no DT_JMPREL tag: Layout as before. - When the top bit of _GLOBAL_OFFSET_TABLE_[1] is set and there is a DT_JMPREL tag. - _GLOBAL_OFFET_TABLE_[0] points to the resolver for the traditional lazy-binding stubs. - _GLOBAL_OFFSET_TABLE_[1] is a module pointer. - _GLOBAL_OFFSET_TABLE_[2] points to the PLT resolver. - _GLOBAL_OFFSET_TABLE_[3] is the first local or global GOT entry. The PLT resolver needs to obtain two bits of information: - the module pointer - the target function's index in .got.plt/.rel.plt ARM is another target that places .got.plt separately from .got. Loosely following its example, I used this resolver interface: $14 : the start of .got.plt $15 : &_GLOBAL_OFFSET_TABLE_[2] $24 : the .got.plt entry for the target function Thus "$15 - sizeof (void *)" points to the module pointer and "($24 - $14) / sizeof (void *)" is the PLT index. Note that I chose to pass $24 instead of the relocation index itself because we can then use a 4-instruction PLT entry without imposing any limit on the _number_ of PLT entries. Although the dynamic linker could work out the executable's module pointer without $15, I thought it was better to have an interface that would work for shared libraries too, in case we ever do want to use PLTs for shared libraries in future. The PLT entry for a function "f" is: lui $24,%hi(.got.plt slot for f) lw $25,%lo(.got.plt slot for f)($24) jr $25 addiu $24,$24,%lo(.got.plt slot for f) The PLT header itself is: lui $15,%hi(&_GLOBAL_OFFSET_TABLE_[2]) lw $25,%lo(&_GLOBAL_OFFSET_TABLE_[2])($15) addiu $15,$15,%lo(&_GLOBAL_OFFSET_TABLE_[2]) lui $14,%hi(.got.plt) jr $25 addiu $14,$14,%lo(.got.plt) The header is followed by 8 bytes of padding so that each PLT entry is 16-byte aligned. This PLT entry is deliberately not compatible with MIPS I. I thought that fitting the PLT entry into a 16-byte cache line was more important than supporting such an obselete ISA level. Hopefully anyone who still uses MIPS I won't mind sticking to the traditional scheme. (Hi Maciej!) As on other SVR4 targets, PLT entries have type STT_FUNC and belong to SHN_UNDEF. Unfortunately, as Nigel Stephens pointed out when discussing this a while ago with Dan Jacobowitz and I, this clashes with the traditional MIPS lazy-binding stubs, which also use the STT_FUNC/SHN_UNDEF combination. I followed Nigel's suggestion of adding an STO_MIPS_PLT symbol type to distinguish PLTs from traditional stubs. Linking PIC and non-PIC in the same object ------------------------------------------ Most targets allow PIC and non-PIC to be linked together, and it would be awkward if MIPS didn't. This means that the static linker has to cope with things like: a.s (non-PIC): jal foo b.s (PIC): foo: .cpload $25 ... We can handle this situation in two ways. If the target function "foo" starts a section, and the section is not too heavily-aligned, we can insert: lui $25,%hi(1f) addiu $25,$25,%lo(1f) 1: immediately before it. This code goes in a new section and is padded with leading nops if necessary. "foo" then resolves to the "lui" instruction, so that all references to "foo" have the same address. I think this is an important optimisation. In practice, most uses of PIC in executables will come from static libraries, which usually have one function per section. However, if "foo" doesn't start a section, the linker must create a separate trampoline of the form: foo: lui $25,%hi(.pic.foo) j .pic.foo addiu $25,$25,%lo(.pic.foo) where .pic.foo is the original PIC form of "foo". Again, "foo" resolves to this trampoline, so that all references to "foo" have the same address. These trampolines all go in a separate section at the beginning of .text. They are padded with a nop so that each one is aligned to 16 bytes. ld -r ----- It should be possible to use "ld -r" to link PIC and non-PIC together into a relocatable object. The result is clearly a non-PIC object, so what do we do with PIC functions? One option would be to add "la $25" prefixes or trampolines to all of them, but that would be inefficient. I thought it would be better to mark PIC functions with a new st_other value, STO_MIPS_PIC. This allows the final link to distinguish between PIC and non-PIC functions in the same input file. n32 and n64 GP-load sequences ----------------------------- n32 and n64 use the idiom: lui $28,%hi(%neg(%gp_rel(foo))) addiu $28,$28,%lo(%neg(%gp_rel(foo))) addu $28,$28,$25 to load the value of _gp. Such a reference to foo should not be redirected to an "la $25" stub. Choice of new STO_* values -------------------------- For reasons I don't understand, STO_MIPS16 is defined as 0xf0, taking up 4 of the 8 bits in st_other. Visiblity accounts for 2 more. That gives us enough bits to treat STO_MIPS_PLT and STO_MIPS_PIC as orthogonal to both MIPS16ness and visiblity, but we wouldn't have any room left over. Fortunately, STO_MIPS_PLT, STO_MIPS_PIC are STO_MIPS16 are mutually-exclusive, so we can simply reinterpret the top 4 bits of st_other as an enum. 0x0c would then be free for future extensions. Identifying non-PIC relocatable objects --------------------------------------- MIPS has two PICness flags: EF_MIPS_PIC and EF_MIPS_CPIC ("calls PIC"). We can therefore mark non-PIC abicalls objects as: (flags & (EF_MIPS_PIC | EF_MIPS_CPIC)) == EF_MIPS_CPIC Assembler directives and command-line interface ----------------------------------------------- The EF_MIPS_CPIC combination is generated by the assembler directives: .abicalls .option pic0 There is no command-line flag to select this mode, so I added one called -call_nonpic. I'm not too tied to that name though. GCC command-line interface -------------------------- GCC 4.2 has an "-mno-shared" option. As its name implies, this option can only be used to compile executables. It applies on top of "-mabicalls" and allows GCC to use absolute references for things that it can prove are defined by the executable itself. Functions compiled with "-mno-shared" do not require $25 to be valid on entry, so the compiler can also use direct jumps and calls to functions in the same object file. Of course, as in the example above, there are many cases in which the compiler cannot prove that something is defined by the executable. The non-PIC support is designed to plug that gap. So, from a conceptual point of view, the new functionality is really a special "-mno-shared" mode; it allows the compiler to use absolute references for all data except TLS. I decided to add a new pair of GCC options, "-mgnu-plts" and "-mno-gnu-plts", that apply on top of "-mno-shared". There are then four basic forms of o32, n32 and n64 code: (1) -mno-abicalls For non-dynamic objects like the linux kernel. (2) -mabicalls -mno-shared -mgnu-plts For dynamic executables. Code only uses the global offset table for some models of thread-local storage; it uses absolute accesses for everything else. If it has to call functions indirectly, such as for: void foo (void (*t) (void)) { t (); } it continues to call through $25. This combination requires support from both the static and dynamic linkers. (3) -mabicalls -mno-shared -mno-gnu-plts For dynamic executables. Code uses absolute accesses for objects that are defined in the executable itself. It uses direct calls for functions in the same translation unit. This combination requires support from the static linker. It does not affect the ABI of the final executable. (4) -mabicalls -mshared For any dynamic object. This is the traditional SVR4 mode. Note that shared library code must still be compiled with "-fpic" or "-fPIC". Having four types of executable is complicated, and is not the intended user interface. From the user's perspective, GCC should have the following target-independent interface: - shared library code is compiled with "-fpic" or "-fPIC"; - position-independent executables are compiled with "-fpie" or "-fPIE"; and - for efficiency, position-dependent executables are compiled without any of these "-f" flags. In the last case, GCC should default to the most efficient executable model available. Since 4.3, GCC's configure script automatically checks whether the static linker supports "-mno-shared". It can therefore choose between (3) and (4) without direct intervention. However, GCC's configure script cannot know whether the dynamic linker supports (2), so we really need a new configure option to choose between "(2)" and "(3) or (4)". I therefore added "--with-gnu-plts" and "--without-gnu-plts", where the latter is the default. Non-dynamic executables like the linux kernel remain a special case. They must continue to be compiled with "-mno-abicalls". Linker interface ---------------- The linker should use the new extensions when compiling a non-PIC CPIC executable, but not otherwise. It might be useful to forbid the use of copy relocs and PLTs altogether, so I made -znocopyreloc turn off the extensions. Linker errors ------------- Shared-library code must be compiled with "-fpic" or "-fPIC". However, because MIPS compilers have traditionally used (4) as the default executable mode, the sorts of failure you get by forgetting "-fpic" and "-fPIC" have tended to be very subtle. For example, suppose we have: void foo (void) { ... } void bar (void) { ... foo (); ... } Without "-fpic" or "-fPIC", GCC will think it can inline foo() into bar(). It will then be impossible for an executable (or for another shared library) to override foo() properly. The failure for other targets is more drastic, so most cross-target build systems already do the right thing. However, some MIPS-specific build systems might not. Changing the default executable mode from "(4)" to "(3) or (2)" makes the MIPS failure mode as drastic as it is for other targets. Unfortunately, the MIPS linker has traditionally not checked for accidental uses of absolute code in shared libraries; it would link the following code as a shared library without any diagnostic at all: lui $4,%hi(x) addiu $4,$4,%lo(x) The resulting DSO would treat "x" as having the value 0. This seems too dangerous, so I made the linker complain about any relocation that it cannot resolve itself and that it cannot implement dynamically. The errors have the form: non-dynamic relocations refer to dynamic symbol FOO Comparison with the CS implementation ------------------------------------- I think the main differences with CS's implemention are: - CS treat .got.plt is part of .got. See above for why I think it should be separate. Note that the PLT header is the same size for both implementations, so the extra parameters don't cost much. - CS PLT entries pass a PLT index rather than a .got.plt address. This makes no difference for most objects, but a longer stub is needed if there are more than 0x10000 PLT entries. - I couldn't see any specific support for ld -r in the CS version. - The CS version always uses separate "la $25" trampolines, rather than adding instructions to the beginning of a function. This is an implementation rather than an ABI detail though. - CS support MIPS I, at the cost of using the start of the next PLT entry as a delay slot instruction. - STO_MIPS_PLT is separate from STO_MIPS16. This comparison is based on 4.2-129 and I've probably got it wrong. I'm not sure if CS's version supports n32 and n64, but adding it wouldn't be a big issue. Patches ------- In case it helps the discussion, I've attached patches for binutils, gcc and glibc. I'm not asking for approval though. Each set of patches has prerequisites that haven't been applied yet. I've therefore attached them in the form of a bzipped quilt. The patches with .clog changelog files are the ones related to non-PIC support. The glibc patches are based on EGLIBC 2.6. Richard
Attachment:
binutils-quilt.tar.bz2
Description: Binary data
Attachment:
gcc-quilt.tar.bz2
Description: Binary data
Attachment:
eglibc-quilt.tar.bz2
Description: Binary data