On 24-06-2011 20:10, Agner Fog wrote:
Now I can make a 64 bit shared object with -mcmodel=large and without
-fpic and it works. But as I suspected, it is not optimal. It uses
full 64 bit addresses for almost everything rather than 32 bit
relative addresses. This is inefficient because it makes the code
larger and because 64 bit addressing is poorly supported in the x64
instruction set. It loads the full absolute address into a 64-bit
pointer register whenever it is reading or writing any register other
than eax and when calling an external function.
I need a memory model between medium and large to allow 32-bit
relative addresses but not 32-bit absolute addresses.
I found it! There is a gcc option named -fpie which does exactly this.
The manual says:
"These options are similar to -fpic and -fPIC, but generated position
independent code can be only linked into executables."
I tried it on a 64 bit example. What it really does is make relative
references whereever it can, even in exception handler tables. Only in a
few situations did it make absolute 64-bit references, but no absolute
32-bit references.
With this option, I can make a 64-bit shared object without PLT and GOT,
and it works. The only thing I can't do is global variables. I have to
avoid global variables or hide them with "static" or
"__attribute__((visibility("hidden")))" to avoid the error in the linker
which expects a GOT entry.
It does make a GOT entry, though, if there is a virtual table. This
makes sense because the virtual table must be shared.
The -fpie option is no advantage in 32-bit mode, it still makes GOT
references here.
My conclusion now is:
If you don't need the ability to replace symbols, you can make a shared
object much faster in the following way:
In 32-bit Linux: compile without -fpic.
In 64-bit Linux: compile with -fpie instead of -fpic; avoid global
variables or hide them.
You avoid the GOT and PLT lookups for local references, and you get rid
of the clumsy calculation of relative addresses to the GOT in 32 bit mode.
I don't know if this works in BSD and Mac.
Thank you everybody for explaining things to me, I wouldn't have found
the solution alone.