On Thu, Jun 27, 2019 at 12:59:07PM +0100, Vincenzo Frascino wrote: > On 6/27/19 12:27 PM, Dave Martin wrote: > > On Thu, Jun 27, 2019 at 11:57:36AM +0100, Vincenzo Frascino wrote: [...] > >> Disassembly of section .text: > >> 0000000000000000 show_it: > >> 0: e8 03 1f aa mov x8, xzr > >> 4: 09 68 68 38 ldrb w9, [x0, x8] > >> 8: 08 05 00 91 add x8, x8, #1 > >> c: c9 ff ff 34 cbz w9, #-8 <show_it+0x4> > >> 10: 02 05 00 51 sub w2, w8, #1 > >> 14: e1 03 00 aa mov x1, x0 > >> 18: 08 08 80 d2 mov x8, #64 > >> 1c: 01 00 00 d4 svc #0 > >> 20: c0 03 5f d6 ret > >> > >> Commands used: > >> > >> $ clang -target aarch64-linux-gnueabi main.c -O -c -o main.clang.<x>.o > >> $ llvm-objdump -d main.clang.<x>.o > > > > Actually, I'm not sure this is comparable with the reproducer I quoted > > in my last reply. > > > > As explained in my previous email, this is the only case that can realistically > happen. vDSO has no dependency on any other library (i.e. libgcc you were > mentioning) and we are referring to the fallbacks which fall in this category. Outlining could also introduce a local function call where none exists explicitly in the program IIUC. My point is that the interaction between asm reg vars and machine-level procedure calls is at best ill-defined, and it is largely up to the compiler when to introduce such a call, even without LTO etc. So we should not be surprised to see variations in behaviour depending on compiler, compiler version and compiler flags. > > The compiler can see the definition of strlen and fully inlines it. > > I only ever saw the problem when the compiler emits an out-of-line > > implicit function call. > > > What does clang do with my example on 32-bit? > > When clang is selected compat vDSOs are currently disabled on arm64, will be > introduced with a future patch series. > > Anyway since I am curious as well, this is what happens with your example with > clang.8 target=arm-linux-gnueabihf: > > dave-code.clang.8.o: file format ELF32-arm-little > > Disassembly of section .text: > 0000000000000000 foo: > 0: 00 00 00 ef svc #0 > 4: 1e ff 2f e1 bx lr > > 0000000000000008 bar: > 8: 10 4c 2d e9 push {r4, r10, r11, lr} > c: 08 b0 8d e2 add r11, sp, #8 > 10: 00 40 a0 e1 mov r4, r0 > 14: fe ff ff eb bl #-8 <bar+0xc> > 18: 00 10 a0 e1 mov r1, r0 > 1c: 04 00 a0 e1 mov r0, r4 > 20: 00 00 00 ef svc #0 > 24: 10 8c bd e8 pop {r4, r10, r11, pc} > Compiled with -O2, -O3, -Os never inlines. Looks sane, and is the behaviour we want. > Same thing happens for aarch64-linux-gnueabi: > > dave-code.clang.8.o: file format ELF64-aarch64-little > > Disassembly of section .text: > 0000000000000000 foo: > 0: e0 03 00 2a mov w0, w0 > 4: e1 03 01 2a mov w1, w1 > 8: 01 00 00 d4 svc #0 > c: c0 03 5f d6 ret > > 0000000000000010 bar: > 10: 01 0c c1 1a sdiv w1, w0, w1 > 14: e0 03 00 2a mov w0, w0 > 18: 01 00 00 d4 svc #0 > 1c: c0 03 5f d6 ret Curious, clang seems to be inserting some seemingly redundant moves of its own here, though this shouldn't break anything. I suspect that clang might require an X-reg holding an int to have its top 32 bits zeroed for passing to an asm, whereas GCC does not. I think this comes under "we should not be surprised to see variations". GCC 9 does this instead: 0000000000000000 <foo>: 0: d4000001 svc #0x0 4: d65f03c0 ret 0000000000000008 <bar>: 8: 1ac10c01 sdiv w1, w0, w1 c: d4000001 svc #0x0 10: d65f03c0 ret > Based on this I think we can conclude our investigation. So we use non-reg vars and use the asm clobber list and explicit moves to get things into / out of the right registers? Cheers ---Dave