To answer my own question:
The (somewhat opaque) error message is trying to say that the register
is already in use (as an input arg). This constitutes a case of the
clobber list attempting to overlap the input/output constraints,
something the docs very clearly say you cannot do.
The line from the docs referenced below must therefor be referring to
registers used *other* than the ones in the input/output constraints.
My solution to cause the registers to get clobbered is this:
__CRT_INLINE VOID __stosb(PBYTE Dest, BYTE Data, SIZE_T Count)
{
PBYTE junk1;
__asm__ __volatile__
(
"cld; rep; stosb"
: "=D" (junk1), "=a" (junk1), "=c" (junk1)
: "0" (Dest), "1" (Data), "2" (Count)
: "memory", "cc"
);
}
Note the addition of the volatile qualifier, and the (otherwise unused)
junk1 variable. I'm surprised it let me use the same output for all 3
outputs, but it does.
I wondered if this might result in extra stack bytes being allocated to
hold this local variable, but that doesn't appear to be the case (which
is what I want).
So, while this seems like a clunky way to clobber the registers, it does
produce the code I was looking for.
dw
On 3/30/2013 2:04 PM, dw wrote:
I read this line from the docs:
"If you refer to a particular hardware register from the assembler
code, you probably have to list the register after the third colon to
tell the compiler the register's value is modified."
My own observation shows that this is true. However, attempting to
add the register in question to the clobber list is returning a
compile error.
The asm (essentially memset):
__CRT_INLINE VOID __stosb(PBYTE Dest, BYTE Data, SIZE_T Count)
{
__asm__
(
"cld; rep; stosb"
:
: "D" (Dest), "a" (Data), "c" (Count)
: "edi", "memory", "cc"
);
}
The error:
error: can't find a register in class 'DIREG' while reloading 'asm'
error: 'asm' operand has impossible constraints
Without the edi clobber, this c++ code:
__stosb((PBYTE)&c, 0, sizeof(c));
__stosb((PBYTE)&c, 0, sizeof(c));
generates this asm:
402cd3: cld
402cd4: rep stos BYTE PTR es:[rdi],al
402cd6: cld
402cd7: rep stos BYTE PTR es:[rdi],al
Since rdi is not clobbered, gcc doesn't reload it between calls
(likewise with rcx).
While I might be able to fake the compiler out by specifying outputs
(probably need the volatile qualifier too), I don't really want to
change Dest, I just want to use it as an input.
What's the right way to go here?
dw