ARMv8, GCC 4.9 and necessary code optimized away?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

I'm working on an ARMv8 Mustang server board. Its an early ARMv8 board
(I believe its the first ARM-64 board), and its missing CRC32 and
Crypto extensions.

We have runtime feature tests that attempts to execute an instruction,
like CRC32 or AES, and catches the SIGILL if the instruction is
missing. Its kind of necessary to do it this way since reading a MSR
(ARM's equivalent of CPUID probing) results in a SIGILL for userland
programs (it requires Exception Level 1 or above).

It appears GCC is optimizing away the intrinsics we placed that test
for the features. Later, because of the missing test, HasFeatureX()
returns TRUE and the program dies with a SIGILL.

The code is below. Its not clear to me if GCC is optimizing away the
code because it determines the call to setjmp() never fails, or if it
determines the ARM intrinsics are dead code.

How can we get the expected/desired behavior?

Jeff

********************

$ gcc --version
gcc (Debian/Linaro 4.9.2-10) 4.9.2

$ lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:    Debian GNU/Linux 8.4 (jessie)
Release:    8.4
Codename:    jessie

********************

Here is the CRC32 runtime feature test. The problem does not happen at
-O0. Tricks like making TryCRC32 volatile did not work.

static jmp_buf s_jmpNoCRC32;
static void SigIllHandlerCRC32(int)
{
    longjmp(s_jmpNoCRC32, 1);
}

static bool TryCRC32()
{
#if defined(__ARM_FEATURE_CRC32)
    // longjmp and clobber warnings. Volatile is required.
    volatile bool result = true;

    volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32);
    if (oldHandler == SIG_ERR)
        result = false;

    volatile sigset_t oldMask;
    if (sigprocmask(0, NULL, (sigset_t*)&oldMask))
        result = false;

    if (setjmp(s_jmpNoCRC32))
        result = false;
    else
    {
        uint32_t w=0, x=0; uint16_t y=0; uint8_t z=0;
        w = __crc32cw(w,x);
        w = __crc32ch(w,y);
        w = __crc32cb(w,z);
    }

    sigprocmask(SIG_SETMASK, (sigset_t*)&oldMask, NULL);
    signal(SIGILL, oldHandler);
    return result;
#else
    return false;
#endif
}

********************

Here's what it looks like under the debugger.

Breakpoint 1, TryCRC32 () at cpu.cpp:425
425    {
(gdb) s
445        volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32);
(gdb) n
443        volatile bool result = true;
(gdb)
425    {
(gdb)
445        volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32);
(gdb)
443        volatile bool result = true;
(gdb)
445        volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32);
(gdb)
446        if (oldHandler == SIG_ERR)
(gdb)
450        if (sigprocmask(0, NULL, (sigset_t*)&oldMask))
(gdb)
453        if (setjmp(s_jmpNoCRC32))
(gdb)
463        sigprocmask(SIG_SETMASK, (sigset_t*)&oldMask, NULL);
(gdb)
464        signal(SIGILL, oldHandler);
(gdb)
465        return result;
(gdb) p result
$1 = true

********************

$ gdb -batch -ex 'disassemble TryCRC32' cpu.o
Dump of assembler code for function TryCRC32():
   0x0000000000000148 <+0>:    stp    x29, x30, [sp,#-160]!
   0x000000000000014c <+4>:    adrp    x1, 0x0 <SigIllHandlerNEON(int)>
   0x0000000000000150 <+8>:    mov    w2, #0x1                       // #1
   0x0000000000000154 <+12>:    mov    x29, sp
   0x0000000000000158 <+16>:    mov    w0, #0x4                       // #4
   0x000000000000015c <+20>:    add    x1, x1, #0x0
   0x0000000000000160 <+24>:    strb    w2, [x29,#23]
   0x0000000000000164 <+28>:    bl    0x164 <TryCRC32()+28>
   0x0000000000000168 <+32>:    str    x0, [x29,#24]
   0x000000000000016c <+36>:    ldr    x0, [x29,#24]
   0x0000000000000170 <+40>:    cmn    x0, #0x1
   0x0000000000000174 <+44>:    b.eq    0x1d4 <TryCRC32()+140>
   0x0000000000000178 <+48>:    add    x2, x29, #0x20
   0x000000000000017c <+52>:    mov    x1, #0x0                       // #0
   0x0000000000000180 <+56>:    mov    w0, #0x0                       // #0
   0x0000000000000184 <+60>:    bl    0x184 <TryCRC32()+60>
   0x0000000000000188 <+64>:    cbnz    w0, 0x1cc <TryCRC32()+132>
   0x000000000000018c <+68>:    adrp    x0, 0x0 <SigIllHandlerNEON(int)>
   0x0000000000000190 <+72>:    add    x0, x0, #0x0
   0x0000000000000194 <+76>:    add    x0, x0, #0x138
   0x0000000000000198 <+80>:    bl    0x198 <TryCRC32()+80>
   0x000000000000019c <+84>:    cbz    w0, 0x1a4 <TryCRC32()+92>
   0x00000000000001a0 <+88>:    strb    wzr, [x29,#23]
   0x00000000000001a4 <+92>:    mov    x2, #0x0                       // #0
   0x00000000000001a8 <+96>:    add    x1, x29, #0x20
   0x00000000000001ac <+100>:    mov    w0, #0x2                       // #2
   0x00000000000001b0 <+104>:    bl    0x1b0 <TryCRC32()+104>
   0x00000000000001b4 <+108>:    ldr    x1, [x29,#24]
   0x00000000000001b8 <+112>:    mov    w0, #0x4                       // #4
   0x00000000000001bc <+116>:    bl    0x1bc <TryCRC32()+116>
   0x00000000000001c0 <+120>:    ldrb    w0, [x29,#23]
   0x00000000000001c4 <+124>:    ldp    x29, x30, [sp],#160
   0x00000000000001c8 <+128>:    ret
   0x00000000000001cc <+132>:    strb    wzr, [x29,#23]
   0x00000000000001d0 <+136>:    b    0x18c <TryCRC32()+68>
   0x00000000000001d4 <+140>:    strb    wzr, [x29,#23]
   0x00000000000001d8 <+144>:    b    0x178 <TryCRC32()+48>
End of assembler dump.



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux