Hi. I am wondering if there are not some improvements that could be made in generation of x86 FPU code. Here is a simple sign function: float signf4(float x) { return x < 0.0f ? -1.0f : 1.0f; } It generates the following assembler code (GCC 4.7.2, g++ -m32 -O3 -fverbose-asm -save-temps -g3 -ggdb -march=native): _Z6signf4f: .LFB84: .loc 1 27 0 .cfi_startproc .LVL4: .loc 1 30 0 fld1 fldz flds 4(%esp) # x fxch %st(1) # ??? Why? fucomip %st(1), %st #, ffreep %st(0) # fld1 fchs fcmovbe %st(1), %st #,, fstp %st(1) # .loc 1 31 0 ret I am wondering why is the fxch instruction necessary and why is the code not instead like this? _Z6signf4f: .LFB84: .loc 1 27 0 .cfi_startproc .LVL4: .loc 1 30 0 fld1 flds 4(%esp) # ??? Load the parameter before the zero. fldz # ??? to avoid the fxch instruction. fucomip %st(1), %st #, [...] -- VZ
Attachment:
signature.asc
Description: OpenPGP digital signature