Re: optimization problem: ptr not kept in register

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/26/2014 06:48 PM, Ian Lance Taylor wrote:

My point was that the out buffer's cur ptr gets loaded/stored all the time,
even stored more than once in succession on certain paths. Yes,
encode_noinline() could, and actually, will modify the cur ptr. But that
call is on a marked unlikely path, while the likely path doesn't contain any
calls, so could work entirely with registers. The loading/storing of cur on
the likely path is a pessimization that affects performance.

I hope this clarifies it. Is it then an optimizer issue?
I see what you mean.  You want the compiler to pull the value out of
memory for the likely loop and then store it back into memory for the
unlikely case.  That seems possible.  My first thought is that that
would be a moderately costly optimization that would very rarely pay
off, but I could be wrong.
Thanks, that might be part of it, but it seems that something else is at play here. To test the theory that the function call sabotages moving the cur ptr to a register, I commented out the noinline attribute at line 13. before encode_noinline(). There are no function calls, but now I'm really puzzled:

Dump of assembler code for function encode_node_list(OutBuf&, Node*):
   0x0000000000400600 <+0>:    test   %rsi,%rsi
0x0000000000400603 <+3>: je 0x40062f <encode_node_list(OutBuf&, Node*)+47>
// load outbuf's cur ptr
   0x0000000000400605 <+5>:    mov    (%rdi),%rax
0x0000000000400608 <+8>: jmp 0x400613 <encode_node_list(OutBuf&, Node*)+19>
   0x000000000040060a <+10>:    nopw   0x0(%rax,%rax,1)
   0x0000000000400610 <+16>:    mov    %rcx,%rax
// load the data
   0x0000000000400613 <+19>:    mov    (%rsi),%edx
// calc next cur
   0x0000000000400615 <+21>:    lea    0x4(%rax),%rcx
// store!
   0x0000000000400619 <+25>:    mov    %rcx,(%rdi)
   0x000000000040061c <+28>:    cmp    $0xff,%edx
0x0000000000400622 <+34>: jg 0x400631 <encode_node_list(OutBuf&, Node*)+49>
   0x0000000000400624 <+36>:    mov    %edx,(%rax)
   0x0000000000400626 <+38>:    mov    0x8(%rsi),%rsi
   0x000000000040062a <+42>:    test   %rsi,%rsi
0x000000000040062d <+45>: jne 0x400610 <encode_node_list(OutBuf&, Node*)+16>
   0x000000000040062f <+47>:    repz retq
   0x0000000000400631 <+49>:    lea    0x8(%rax),%rcx
   0x0000000000400635 <+53>:    cmp    $0xffff,%edx
   0x000000000040063b <+59>:    movl   $0x0,(%rax)
// store again!
   0x0000000000400641 <+65>:    mov    %rcx,(%rdi)
0x0000000000400644 <+68>: jg 0x40064b <encode_node_list(OutBuf&, Node*)+75>
   0x0000000000400646 <+70>:    mov    %edx,0x4(%rax)
0x0000000000400649 <+73>: jmp 0x400626 <encode_node_list(OutBuf&, Node*)+38>
// from here: the code of encode_noinline()
   0x000000000040064b <+75>:    cmp    $0xffffff,%edx
   0x0000000000400651 <+81>:    movl   $0x0,0x4(%rax)
0x0000000000400658 <+88>: jg 0x400666 <encode_node_list(OutBuf&, Node*)+102>
   0x000000000040065a <+90>:    lea    0xc(%rax),%rcx
// and store again!
   0x000000000040065e <+94>:    mov    %rcx,(%rdi)
   0x0000000000400661 <+97>:    mov    %edx,0x8(%rax)
0x0000000000400664 <+100>: jmp 0x400626 <encode_node_list(OutBuf&, Node*)+38>
   0x0000000000400666 <+102>:    lea    0x10(%rax),%rcx
   0x000000000040066a <+106>:    movl   $0x0,0x8(%rax)
// and again!
   0x0000000000400671 <+113>:    mov    %rcx,(%rdi)
   0x0000000000400674 <+116>:    mov    %edx,0xc(%rax)
0x0000000000400677 <+119>: jmp 0x400626 <encode_node_list(OutBuf&, Node*)+38>

I naively thought, that if everyhing is inlined, and for code so simple, the ptr will be kept in a register all the time: loaded once at the beginning, stored once at the end. What is going on?

I thought about aliasing rules, too. I deliberately chose int* instead of char*, because in the latter case, the rules say thay writing to a char* invalidates everything. But for an int*, writing an int to the memory can't invalidate the pointer itself, because they are different types. The strict aliasing rules say, if I'm not mistaken, that if I write to a pointer, that will invalidate all values read from pointers pointing to the same type and live in registers, so they must be reloaded, unless the read or written pointer defined as __restrict, which means the pointer isn't aliasing other pointers (of the same type). Am I right?

If I compile w/ -fno-strict-aliasing, then the cur ptr will be reloaded each time after a 0 write was performed, as expected. Interestingly, the code is shorter than above by 17 bytes.

So with strict aliasing, the unnecessary loads are eliminated, but why are there unnecessary stores?

Thanks, Peter





[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux