Hi, I am trying to understand the code generated when accessing global variables on PIE code and the reason for some differences between 32-bit and 64-bit cases. For the example input file int a, b; int f(void) { return a + b; } With -m64 identical code is generated between -fcommon and -fno-common: movl b(%rip), %eax addl a(%rip), %eax using R_X86_64_PC32 relocations. So in both cases the calculation is S + A - P and we do not use any GOT entries. With -m32 -fno-common: call __x86.get_pc_thunk.dx addl $_GLOBAL_OFFSET_TABLE_, %edx movl b@GOTOFF(%edx), %eax addl a@GOTOFF(%edx), %eax using R_386_GOTOFF relocations. Calculation is S + A - GOT. The code does not go through the GOT entries. With -m32 -fcommon: call __x86.get_pc_thunk.ax addl $_GLOBAL_OFFSET_TABLE_, %eax movl a@GOT(%eax), %edx movl b@GOT(%eax), %eax movl (%eax), %eax addl (%edx), %eax using R_386_GOT32X relocations calculated as G + A - GOT. This time we use the GOT entries. Why can the -m32 -fcommon case not use the same code as the -fno-common case in order to avoid indirecting through the GOT, while for 64-bit we can achieve that regardless of the -fcommon setting? For 64-bit, indirection through GOT appears to be done only with -fPIC. PS: Separate comment on 32-bit code: if the function only uses R_386_GOTOFF relocations, is it not possible to replace them with R_386_PC32, adjusting the addend for the number of instruction bytes between the prologue code where we loaded the PC and the instruction that accesses the variable? This would eliminate the instruction to add $_GLOBAL_OFFSET_TABLE_. With the addition of an R_386_GOTPCREL relocation similar to the 64-bit case, the R_386_GOT32 relocations could also be handled that way.