On Tue, 2021-04-20 at 10:07 -0500, Peng Yu via Gcc-help wrote: > On 4/20/21, Stefan Ring <stefanrin@xxxxxxxxx> wrote: > > On Tue, Apr 20, 2021 at 4:27 PM Peng Yu via Gcc-help > > <gcc-help@xxxxxxxxxxx> wrote: > > > > > > How does the linker know that it should look for the string literal > > > in > > > .rodata just using the object file? > > > > You should dump the objects file's relocations, then you'll > > understand. > > Here is the result of relocations. I don't understand how it can be > used to resolve my question. Could you please clarify? Then you should just find and read a textbook about linking, instead of posting the off-topic question in a mail list. The linker is not a part of GCC, the disassembler neither. > $ readelf -r a.o > > Relocation section '.rela.text' at offset 0x210 contains 2 entries: > Offset Info Type Sym. Value Sym. > Name + Addend > 000000000007 000500000002 R_X86_64_PC32 0000000000000000 .rodata > - 4 > 00000000000c 000b00000004 R_X86_64_PLT32 0000000000000000 puts - 4 > > Relocation section '.rela.eh_frame' at offset 0x240 contains 1 entry: > Offset Info Type Sym. Value Sym. > Name + Addend > 000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + > 0 > > > > $ objdump --disassemble=main a.o > > > ... > > > 0000000000000000 <main>: > > > 0: 55 push %rbp > > > 1: 48 89 e5 mov %rsp,%rbp > > > 4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b > > > <main+0xb> > > > b: e8 00 00 00 00 callq 10 <main+0x10> > > > 10: b8 00 00 00 00 mov $0x0,%eax > > > 15: 5d pop %rbp > > > 16: c3 retq > > > > > > Where is the number "10" in the callq line from? "00 00 00 00" is > > > just > > > address 0. So the disassembler knows "00 00 00 00" is not a legal > > > address, so it just put the address of the next instruction which > > > is > > > "10"? > > > > The displacement in the call instruction is 0, which is relative to > > the address after the current instruction, which is 10. So the > > disassembler displays it as a call to address 10. > > How do I know it is a relative call? I don't even find the callq > instruction in the intel assemly code manual. Where is mnemonics callq > from? "q" just means "64-bit mode". You can see the hexdump is E8 00 00 00 00. It's clearly documented at page 3-122: > E8 cd: > > CALL rel32 > > near, relative, displacement relative to next > instruction. 32-bit displacement sign extended to > 64-bits in 64-bit mode. > https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf > > > > $ objdump --disassemble=main a.out > > > ... > > > 0000000000001135 <main>: > > > 1135: 55 push %rbp > > > 1136: 48 89 e5 mov %rsp,%rbp > > > 1139: 48 8d 3d c4 0e 00 00 lea > > > 0xec4(%rip),%rdi # > > > 2004 > > > <_IO_stdin_used+0x4> > > > 1140: e8 eb fe ff ff callq 1030 <puts@plt> > > > 1145: b8 00 00 00 00 mov $0x0,%eax > > > 114a: 5d pop %rbp > > > 114b: c3 retq > > > ... > > > > > > How does objdump figure out 0xec4(%rip) is the address > > > _IO_stdin_used+0x4? > > > > I guess it just picks the closest symbol. It's basically meaningless > > in this case. > > Are you sure it is nonsense? If it is nonsense, why is it generated? > _IO_stdin_used is clearly a symbol in the .rodata section. > > $ readelf -s a.out | grep _IO_stdin_used > 55: 0000000000002000 4 OBJECT GLOBAL DEFAULT 16 > _IO_stdin_used > $ readelf -SW a.out | grep '\[16\]' > [16] .rodata PROGBITS 0000000000002000 002000 > 000011 00 A 0 0 4 > In a specific binary it may make some sense: +0x4 is just calculated as a offset related to the nearest symbol ("_IO_stdin_used" here). Generally it's nonsense: the compiler and linker may reorder objects, there is no way to predict the location of an object (unless using some tricky linker scripts). -- Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx> School of Aerospace Science and Technology, Xidian University