I think I may have answered my own question. While talking about
clobbers, the docs say:
"You may not write a clobber description in a way that overlaps with an
input or output operand."
"There is no way for you to specify that an input operand is modified
without also specifying it as an output operand."
So (by implication) if you modify an input, you are expected to specify
that fact, and you can't do that with a clobber. So, reading between
the lines, the assumption here is that modifying input-only parameters
is bad.
Which brings us to the next question: If input-only parameters are
assumed to be unchanged, why are some being re-loaded? These
unnecessary instructions affect both program size and execution performance.
The code below shows 2 instances of unnecessary assembly code being
generated:
1) rdi being loaded (unnecessarily) at 407818.
2) eax being zeroed twice.
And I can provide code that shows more.
I'm trying to decide how important this is. If this is only happening
because I'm using an asm block, that's one thing. While not good, I'm
not optimistic about what's going to happen if I open a bug that only
affects performance around inline asm.
On the other hand, it seems possible that this is a more general problem
where the optimizer is simply missing a category of optimization
opportunities. Ok, not likely, but possible.
To file or not to file. That is the question.
If anyone has any thoughts or insight to share, I'd love to hear it.
dw
On 7/2/2013 6:41 PM, dw wrote:
I'm trying to understand how input parameters are used by gcc's
extended asm. Each time I think I've got a handle on it, it does
something else unexpected.
For example, this c++ code:
inline void moo(unsigned char *Dest, unsigned char Data, int Count) {
asm volatile (
"rep stosb"
: /* no outputs */
: "D" (Dest), "a" (Data), "c" (Count)
: "memory");
}
int main()
{
unsigned char buff1[32];
moo(buff1, 0, sizeof(buff1));
moo(buff1, 0, sizeof(buff1));
return 0;
}
Compiling for 64bit on i386 using -Os, I get:
0000000000407800 <main>:
407800: push rdi
407801: sub rsp,0x40
407805: call 4022d0 <__main>
40780a: lea rdi,[rsp+0x20]
40780f: mov ecx,0x20
407814: xor eax,eax
407816: rep stos BYTE PTR es:[rdi],al
407818: lea rdi,[rsp+0x20]
40781d: rep stos BYTE PTR es:[rdi],al
40781f: xor eax,eax
407821: add rsp,0x40
407825: pop rdi
407826: ret
Now, there are a few noteworthy things here:
1) ecx gets loaded for the first call, but not the second.
2) rdi gets loaded for both calls.
3) eax gets zeroed before the first call, does not for the second, but
then gets zeroed again for the return code.
When I saw that ecx wasn't getting re-loaded, I speculated that inputs
are assumed to be unchanged by the asm unless they are also listed as
output. This was not what I expected, but upon reflection, I could
see how that made sense.
But if that's true, why does rdi get re-loaded each time? My first
guess was that the "memory" clobber was causing this. But removing it
didn't change the asm that got generated.
And what about the fact that rax is getting zeroed for the first call,
not for the second, then zeroed again for the return value. If the
optimizer is assuming input values are unchanged by asm blocks, why
did it need to re-assign it, but only sometimes?
I have tried other experiments in an attempt to understand the pattern
here, but the more I try, the more unclear things become. Rather than
posting all my tests here making this post harder to read, I'll just
start with the most important question first, then ask a followup or two:
When can and can't you (safely) modify extended asm input-only
parameters? Unlike output parameters (which must be lvalues), inputs
are expressions. Does this mean they are supposed to be modifiable at
will? Or must they (all and always) be treated as read-only?
dw