I've reduced the faulty test case to the following code: ================================= a; long b; register unsigned long current_stack_pointer asm("rsp"); handle_external_interrupt_irqoff() { asm("and $0xfffffffffffffff0, %%rsp\n\tpush $%c[ss]\n\tpush " "%[sp]\n\tpushf\n\tpushq $%c[cs]\n\tcall *%[thunk_target]\n" : [ sp ] "=&r"(b), "+r" (current_stack_pointer) : [ thunk_target ] "rm"(a), [ ss ] "i"(3 * 8), [ cs ] "i"(2 * 8) ); } ================================= (in fact creduce even throws away current_stack_pointer, but we probably want to keep it to prove the point). Clang generates the following code for it: $ clang vmx.i -O2 -c -w -o vmx.o $ objdump -d vmx.o ... 0000000000000000 <handle_external_interrupt_irqoff>: 0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <handle_external_interrupt_irqoff+0x6> 6: 89 44 24 fc mov %eax,-0x4(%rsp) a: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp e: 6a 18 pushq $0x18 10: 50 push %rax 11: 9c pushfq 12: 6a 10 pushq $0x10 14: ff 54 24 fc callq *-0x4(%rsp) 18: 48 89 05 00 00 00 00 mov %rax,0x0(%rip) # 1f <handle_external_interrupt_irqoff+0x1f> 1f: c3 retq The question is whether using current_stack_pointer as an output is actually a valid way to tell the compiler it should not clobber RSP. Intuitively it is, but explicitly adding RSP to the clobber list sounds a bit more bulletproof.