On 14/07/2011 16:24, Parmenides wrote:
2011/7/14 Ian Lance Taylor<iant@xxxxxxxxxx>:
Parmenides<mobile.parmenides@xxxxxxxxx> writes:
2. "and not optimize stores or loads to that memory"
Except caching memory values in registers, is there any other
optimizaiton for stores or loads to memory?
Not in this case, I think it's just another way of saying the same
thing.
I think the reordering instructions involving memory operations but
not stores or loads might count as some optimization. It seems that a
"memory" will prevent gcc from this kind of optimization. If so, would
the manual give some statements about it.
I'm sorry, I don't understand what you mean.
For the purpose of understanding some gcc's features, without ideas of
details underlying gcc, I have to code some examples in C and compile
them into assembly code, then observe them to get some ideas. Memory
values caching in registers is one optimization taken by gcc,
reordering instructions is another. A "memory" clobber in an inline
assembly may have influence on the both. I have coded an example in C
to try to understand the former.
int s = 0;
int tst(int lim)
{
int i;
for (i = 1; i< lim; i++)
s = s + i;
asm volatile(
"nop"
);
s = s * 10;
return s;
}
To compile the C souce, the following command is excuted.
gcc -S -O tst.c
The corresponding assembly code is as follows:
tst:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
cmpl $1, %ecx
jle .L2
movl s, %edx
movl $1, %eax
.L4:
addl %eax, %edx
incl %eax
cmpl %eax, %ecx
jne .L4
movl %edx, s<--- After the loop, s is write back into memory.
.L2:
movl s, %eax<--- Before the evaluating 's = s * 10', s
is reload into register.
leal (%eax,%eax,4), %eax
addl %eax, %eax
movl %eax, s
popl %ebp
ret
So, the "memory" clobber have prevented the optimization. But for the
latter case, namely reordering instructions, I can not obtain an
example like the above to illustrate how "memory" clobber prevent
reordering instructions. I don't know some circumstances under which
gcc will do reodering. Without them, I can not observe the effect of
the "memory" clobber.
I don't think you're going to find a suitable example, because I don't
think a memory barrier will interact much with other memory
optimisations, such as re-ordered loads and stores, speculative loads,
etc. The barrier gives you a point in the code that says "all memory
operations before this point should be completed, and no memory
operations after this point should be started".
If you've got code like this:
extern int data[32];
void foo(void) {
int a = data[0];
int b = data[1];
data[2] = b;
data[3] = a;
asm volatile ("" ::: "memory");
int c = data[0];
int d = data[1];
data[4] = c;
data[5] = d;
}
The compiler is still free to re-arrange the loads and stores of data
/before/ the memory barrier. On a superscaler cpu, it might choose to
read data[1] before data[0], to better hide the read latencies. Or it
might issue a speculative load or a cache pre-load instruction first.
Or it might store data[3] before data[2], to improve the pipelining. Or
it might load data[0] and data[1] with a double-register load
instruction. There are lots of potential "memory optimisations"
available, and the compiler can do any of them. The only thing the
barrier does is separate the function into two halves, and the compiler
can't re-order memory operations across the barrier.