Re: Is there any other optimization for memory?

David Brown <david@xxxxxxxxxxxxxxx> · Fri, 15 Jul 2011 09:29:21 +0200

On 14/07/2011 16:24, Parmenides wrote:
2011/7/14 Ian Lance Taylor<iant@xxxxxxxxxx>:
Parmenides<mobile.parmenides@xxxxxxxxx>  writes:

2.  "and not optimize stores or loads to that memory"
Except caching memory values in registers, is there any other
optimizaiton for stores or loads to memory?

Not in this case, I think it's just another way of saying the same
thing.

I think the reordering instructions involving memory operations but
not stores or loads might count as some optimization. It seems that a
"memory" will prevent gcc from this kind of optimization. If so, would
the manual give some statements about it.

I'm sorry, I don't understand what you mean.

For the purpose of understanding some gcc's features, without ideas of
details underlying gcc, I have to code some examples in C and compile
them into assembly code, then observe them to get some ideas. Memory
values caching in registers is one optimization taken by gcc,
reordering instructions is another. A "memory" clobber in an inline
assembly may have influence on the both. I have coded an example in C
to try to understand the former.

int s = 0;
int tst(int lim)
{
      int i;

      for (i = 1; i<  lim; i++)
           s = s + i;

      asm volatile(
           "nop"
           );

      s = s * 10;

      return s;
}

To compile the C souce, the following command is excuted.
gcc -S -O tst.c

The corresponding assembly code is as follows:
tst:
         pushl   %ebp
         movl    %esp, %ebp
         movl    8(%ebp), %ecx
         cmpl    $1, %ecx
         jle     .L2
         movl    s, %edx
         movl    $1, %eax
.L4:
         addl    %eax, %edx
         incl    %eax
         cmpl    %eax, %ecx
         jne     .L4
         movl    %edx, s<--- After the loop, s is write back into memory.
.L2:
         movl    s, %eax<--- Before the evaluating 's = s * 10', s
is reload into register.
         leal    (%eax,%eax,4), %eax
         addl    %eax, %eax
         movl    %eax, s
         popl    %ebp
         ret

So, the "memory" clobber have prevented the optimization. But for the
latter case, namely reordering instructions, I can not obtain an
example like the above to illustrate how "memory" clobber prevent
reordering instructions. I don't know some circumstances under which
gcc will do reodering. Without them, I can not observe the effect of
the "memory" clobber.

I don't think you're going to find a suitable example, because I don't 
think a memory barrier will interact much with other memory 
optimisations, such as re-ordered loads and stores, speculative loads, 
etc.  The barrier gives you a point in the code that says "all memory 
operations before this point should be completed, and no memory 
operations after this point should be started".

If you've got code like this:

extern int data[32];
void foo(void) {
	int a = data[0];
	int b = data[1];
	data[2] = b;
	data[3] = a;
	asm volatile ("" ::: "memory");
	int c = data[0];
	int d = data[1];
	data[4] = c;
	data[5] = d;
}

The compiler is still free to re-arrange the loads and stores of data 
/before/ the memory barrier.  On a superscaler cpu, it might choose to 
read data[1] before data[0], to better hide the read latencies.  Or it 
might issue a speculative load or a cache pre-load instruction first. 
Or it might store data[3] before data[2], to improve the pipelining.  Or 
it might load data[0] and data[1] with a double-register load 
instruction.  There are lots of potential "memory optimisations" 
available, and the compiler can do any of them.  The only thing the 
barrier does is separate the function into two halves, and the compiler 
can't re-order memory operations across the barrier.