On Fri, Feb 09, 2024, Linus Torvalds wrote: > Sean? Does this work for the case you noticed? Yep. You can quite literally see the effect of the asm(""). A "good" sequence directly propagates the result from the VMREAD's destination register to its final destination <+1756>: mov $0x280e,%r13d <+1762>: vmread %r13,%r13 <+1766>: jbe 0x209fa <sync_vmcs02_to_vmcs12+1834> <+1768>: mov %r13,0xe8(%rbx) whereas the "bad" sequence bounces through a different register. <+1780>: mov $0x2810,%eax <+1785>: vmread %rax,%rax <+1788>: jbe 0x209e4 <sync_vmcs02_to_vmcs12+1812> <+1790>: mov %rax,%r12 <+1793>: mov %r12,0xf0(%rbx) > That (b) is very much voodoo programming, but it matches the old magic > barrier thing that Jakub Jelinek suggested for the really *old* gcc > bug wrt plain (non-output) "asm goto". The underlying bug for _that_ > was fixed long ago: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > We removed that for plain "asm goto" workaround a couple of years ago, > so "asm_volatile_goto()" has been a no-op since June 2022, but this > now resurrects that hack for the output case. > > I'm not loving it, but Sean seemed to confirm that it fixes the code > generation problem, so ... Yeah, I'm in the same boat.