On 30/05/16 05:00, Jeffrey Walton wrote:
Hi Everyone, I'm working on an ARMv8 Mustang server board. Its an early ARMv8 board (I believe its the first ARM-64 board), and its missing CRC32 and Crypto extensions. We have runtime feature tests that attempts to execute an instruction, like CRC32 or AES, and catches the SIGILL if the instruction is missing. Its kind of necessary to do it this way since reading a MSR (ARM's equivalent of CPUID probing) results in a SIGILL for userland programs (it requires Exception Level 1 or above). It appears GCC is optimizing away the intrinsics we placed that test for the features. Later, because of the missing test, HasFeatureX() returns TRUE and the program dies with a SIGILL. The code is below. Its not clear to me if GCC is optimizing away the code because it determines the call to setjmp() never fails, or if it determines the ARM intrinsics are dead code. How can we get the expected/desired behavior? Jeff ******************** $ gcc --version gcc (Debian/Linaro 4.9.2-10) 4.9.2 $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 8.4 (jessie) Release: 8.4 Codename: jessie ******************** Here is the CRC32 runtime feature test. The problem does not happen at -O0. Tricks like making TryCRC32 volatile did not work. static jmp_buf s_jmpNoCRC32; static void SigIllHandlerCRC32(int) { longjmp(s_jmpNoCRC32, 1); } static bool TryCRC32() { #if defined(__ARM_FEATURE_CRC32) // longjmp and clobber warnings. Volatile is required. volatile bool result = true; volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32); if (oldHandler == SIG_ERR) result = false; volatile sigset_t oldMask; if (sigprocmask(0, NULL, (sigset_t*)&oldMask)) result = false; if (setjmp(s_jmpNoCRC32)) result = false; else { uint32_t w=0, x=0; uint16_t y=0; uint8_t z=0; w = __crc32cw(w,x); w = __crc32ch(w,y); w = __crc32cb(w,z); }
Like Florian said, 'w' is not being used so the compiler may optimise the whole sequence away. On a side note, if you want to detect the availability of certain extensions at runtime have you considered using the hwcaps mechanism? https://community.arm.com/groups/android-community/blog/2014/10/10/runtime-detection-of-cpu-features-on-an-armv8-a-cpu Kyrill
sigprocmask(SIG_SETMASK, (sigset_t*)&oldMask, NULL); signal(SIGILL, oldHandler); return result; #else return false; #endif } ******************** Here's what it looks like under the debugger. Breakpoint 1, TryCRC32 () at cpu.cpp:425 425 { (gdb) s 445 volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32); (gdb) n 443 volatile bool result = true; (gdb) 425 { (gdb) 445 volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32); (gdb) 443 volatile bool result = true; (gdb) 445 volatile SigHandler oldHandler = signal(SIGILL, SigIllHandlerCRC32); (gdb) 446 if (oldHandler == SIG_ERR) (gdb) 450 if (sigprocmask(0, NULL, (sigset_t*)&oldMask)) (gdb) 453 if (setjmp(s_jmpNoCRC32)) (gdb) 463 sigprocmask(SIG_SETMASK, (sigset_t*)&oldMask, NULL); (gdb) 464 signal(SIGILL, oldHandler); (gdb) 465 return result; (gdb) p result $1 = true ******************** $ gdb -batch -ex 'disassemble TryCRC32' cpu.o Dump of assembler code for function TryCRC32(): 0x0000000000000148 <+0>: stp x29, x30, [sp,#-160]! 0x000000000000014c <+4>: adrp x1, 0x0 <SigIllHandlerNEON(int)> 0x0000000000000150 <+8>: mov w2, #0x1 // #1 0x0000000000000154 <+12>: mov x29, sp 0x0000000000000158 <+16>: mov w0, #0x4 // #4 0x000000000000015c <+20>: add x1, x1, #0x0 0x0000000000000160 <+24>: strb w2, [x29,#23] 0x0000000000000164 <+28>: bl 0x164 <TryCRC32()+28> 0x0000000000000168 <+32>: str x0, [x29,#24] 0x000000000000016c <+36>: ldr x0, [x29,#24] 0x0000000000000170 <+40>: cmn x0, #0x1 0x0000000000000174 <+44>: b.eq 0x1d4 <TryCRC32()+140> 0x0000000000000178 <+48>: add x2, x29, #0x20 0x000000000000017c <+52>: mov x1, #0x0 // #0 0x0000000000000180 <+56>: mov w0, #0x0 // #0 0x0000000000000184 <+60>: bl 0x184 <TryCRC32()+60> 0x0000000000000188 <+64>: cbnz w0, 0x1cc <TryCRC32()+132> 0x000000000000018c <+68>: adrp x0, 0x0 <SigIllHandlerNEON(int)> 0x0000000000000190 <+72>: add x0, x0, #0x0 0x0000000000000194 <+76>: add x0, x0, #0x138 0x0000000000000198 <+80>: bl 0x198 <TryCRC32()+80> 0x000000000000019c <+84>: cbz w0, 0x1a4 <TryCRC32()+92> 0x00000000000001a0 <+88>: strb wzr, [x29,#23] 0x00000000000001a4 <+92>: mov x2, #0x0 // #0 0x00000000000001a8 <+96>: add x1, x29, #0x20 0x00000000000001ac <+100>: mov w0, #0x2 // #2 0x00000000000001b0 <+104>: bl 0x1b0 <TryCRC32()+104> 0x00000000000001b4 <+108>: ldr x1, [x29,#24] 0x00000000000001b8 <+112>: mov w0, #0x4 // #4 0x00000000000001bc <+116>: bl 0x1bc <TryCRC32()+116> 0x00000000000001c0 <+120>: ldrb w0, [x29,#23] 0x00000000000001c4 <+124>: ldp x29, x30, [sp],#160 0x00000000000001c8 <+128>: ret 0x00000000000001cc <+132>: strb wzr, [x29,#23] 0x00000000000001d0 <+136>: b 0x18c <TryCRC32()+68> 0x00000000000001d4 <+140>: strb wzr, [x29,#23] 0x00000000000001d8 <+144>: b 0x178 <TryCRC32()+48> End of assembler dump.