Re: Traps for signed arithmetic overflow

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Fri, 23 Nov 2018 14:28:22 -0600

Hi!

On Fri, Nov 23, 2018 at 09:01:56PM +0100, Helmut Eller wrote:
> when compiling this example with gcc -O2 -ftrapv:
> 
>   long foo (long x, long y) { return x + y; }
>   
>   long bar (long x, long y) {
>     long z;
>     if (__builtin_add_overflow (x, y, &z))
>       __builtin_trap ();
>     return z;
>   }
> 
> then GCC seems to produce less efficient code for foo than for bar:
> 
>   foo:
>         subq    $8, %rsp
>         call    __addvdi3@PLT
>         addq    $8, %rsp
>         ret
> 
>   bar:
>         movq    %rdi, %rax
>         addq    %rsi, %rax
>         jo      .L9
>         rep ret
>   .L9:
>         ud2
> 
> I see several inefficiencies:
> 
> 1.) __addvdi3 is not inlined.

It is implemented in libgcc.  The x86 target code does not handle addvdi3,
only addvdi4 (3 calls abort, 4 jumps to its 4th arg).

> 2.) %rsp is adjusted before calling __addvdi3.  Why is that needed?

To keep the stack aligned (to 16 bytes).

> 3.) Obviously __addvdi3 is not implemented as sibling-call even though
>     -O2 should enable that.

It calls via the PLT, do sibling calls via the PLT work in your ABI?

> Where should I start, if I wanted to teach GCC how to produce the same
> code for foo as for bar?  Would it be enough to add a pattern to
> i386.md?  There is already a pattern for "addv<mode>4", but apparently
> it's not used in this case.

As Marc says, -ftrapv is probably not the way to go.

Adding an addv<mode>3 to the i386 backend might help.

You do *not* want exactly the same code, btw; addv3 calls abort on
overflow, that's not the same as executing an ud2 instruction.

Segher