Hello, when compiling this example with gcc -O2 -ftrapv: long foo (long x, long y) { return x + y; } long bar (long x, long y) { long z; if (__builtin_add_overflow (x, y, &z)) __builtin_trap (); return z; } then GCC seems to produce less efficient code for foo than for bar: foo: subq $8, %rsp call __addvdi3@PLT addq $8, %rsp ret bar: movq %rdi, %rax addq %rsi, %rax jo .L9 rep ret .L9: ud2 I see several inefficiencies: 1.) __addvdi3 is not inlined. 2.) %rsp is adjusted before calling __addvdi3. Why is that needed? 3.) Obviously __addvdi3 is not implemented as sibling-call even though -O2 should enable that. Where should I start, if I wanted to teach GCC how to produce the same code for foo as for bar? Would it be enough to add a pattern to i386.md? There is already a pattern for "addv<mode>4", but apparently it's not used in this case. Helmut