I'm getting some unexpected behavior from gcc. I'm not prepared to call
it a bug. I just want to understand what I'm seeing.
In my code (included below), I:
1) Create a 100byte buffer, and set buff[5] to 'A'.
2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'.
3) Use a memory constraint in __stosb to flush buff instead of using the
"memory" clobber. The size of the memory block used in the constraint
is controlled by a #define.
With this, I have a simple test to see if the memory constraint is
correctly causing the buffer to get flushed by the asm call. If it is
flushing the buffer, printing buff[5] after __stosb will print 'B'. If
it is not flushing, it will print 'A'. The results were a bit surprising.
- Since buff[5] is the 6th byte in the buffer, using memory constraint
sizes of 1, 2 & 4 (not surprisingly) all print 'A'.
- Sizes of 8 and 16 print 'B'. This is also the expected result, since
I am now flushing enough of buff to include buff[5].
- The surprise comes from using a size of 3 or 5. These also print
'B'. WTF? Why would 4 not flush, and 3 flush?
I believe the answer comes from reading the RTL. The difference between
sizes of 3 and 16 comes here:
(set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct
*)&buff]+0 S3 A128])
(asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [
(set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0
S16 A128])
(asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [
While I don't really read RTL, TI clearly refers to TIMode. Apparently
when using a size that exactly matches a mode, asm memory references can
flush the right number of bytes . But if not, gcc seems to falls back
to BLK mode.
Which brings us to the essential question here:
Does using BLK mode here *just* flush all of buff? Or does it perform a
full asm "memory" clobber and flush everything?
I've been experimenting, and (unfortunately) it looks like it does the
full clobber (see second program below), but I could use some
confirmation. I could also use an opinion on whether that is the
intended behavior, or is something just going wrong.
Being able to use memory constraints could be a nice performance win
over forcing a full memory clobber.
Thanks,
dw
------------------------------------------------------------
Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2,
using -O2 -fdump-final-insns):
// Code that shows weirdness with memory constraints
#include <stdio.h>
#define MYSIZE 3
inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
struct _reallybigstruct { char x[MYSIZE]; }
*p = (struct _reallybigstruct *)Dest;
__asm__ __volatile__ ("rep stos{b|b}"
: "+D" (Dest), "+c" (Count), "=m" (*p)
: [Data] "a" (Data)
//: "memory"
);
}
int main()
{
unsigned char buff[100];
buff[5] = 'A';
__stosb(buff, 'B', sizeof(buff));
printf("%c\n", buff[5]);
}
-------------------------------------
Here is my attempt to prove that a full clobber is being performed.
Compile this code (as above), and look at the -S output. If using a
size of 8, the assignment for buff2 is after the "rep stosb". Change
this to size 3, and it moves it before. If 3 is really causing a full
memory clobber and 8 is not, this is the behavior I would expect. While
not exactly conclusive, it sure looks like a full clobber.
// Code that tries to prove a full "memory" clobber is being performed.
#include <stdio.h>
#define MYSIZE 3
inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
struct _reallybigstruct { char x[MYSIZE]; }
*p = (struct _reallybigstruct *)Dest;
__asm__ __volatile__ ("rep stos{b|b}"
: "+D" (Dest), "+c" (Count), "=m" (*p)
: [Data] "a" (Data)
//: "memory"
);
}
int main()
{
unsigned char buff1[100], buff2[100];
buff1[5] = 'A';
buff2[5] = 'M';
asm("#" : : "r" (buff2));
__stosb(buff1, 'B', sizeof(buff1));
printf("%c %c\n", buff1[5], buff2[5]);
}