I remember experimenting with "+X" a while ago, see for instance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59155 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59159.
On Fri, 5 Jan 2024, David Brown via Gcc-help wrote:
I was testing a little by looking at some code generation for optimisations
and re-arrangements when -ffast-math is enabled, using code like this in
<https://godbolt.org> :
typedef float T;
T test(T a, T b) {
T x = a + b;
//asm ("" :: "X" (x));
asm ("" : "+X" (x));
//asm ("" : "+X" (x) : "X" (x));
return x - b;
}
Without the asm statements, gcc - as expected - skips the calculation of "x"
and can then simplify "a + b - b" to "a". I have previously used inline
assembly of the form:
asm ("" : "+g" (x));
to tell gcc "You need to calculate x before running this assembly and put it
in a general register or memory, but it might change during the assembly so
you must forget anything you knew about it before". I've found it useful to
force particular orders on calculations, or for debugging, or as a kind of
fine-tuned alternative to a memory barrier.
But the "+g" operand is not ideal for floating point variables - it forces
the compiler to move the variable from a floating point register into a
general-purpose register, then back again. The ideal choice seems to be
"+X", since "X" matches any operand whatsoever.
However, when I use just "asm ("" : "+X" (x));", I get an error message
"error: inconsistent operand constraints in an 'asm'". I have no idea why
this is an issue.
Getting weirder, on x86-64, there is no error if I use
asm ("" : "+X" (x) : "X" (x));
This gives me the desired effect of forcing "x" to be calculated and used in
the final "x - b".
Even weirder, on 32-bit ARM, this still gives the inconsistent operand error.
Weirder still, this works error-free on both targets :
asm ("" :: "X" (x));
asm ("" : "+X" (x));
In my (non-exhaustive) testing, this gives optimal results on both targets,
independent of the compiler version and type T.
I'd imagine that the "X" operand doesn't see much use in real inline assembly
- on x86 and ARM the assembly instruction template would usually depend on
where the data is put. But if anyone can explain this behaviour to me, I am
very curious to know what is going on.
David
--
Marc Glisse