Re: optimization problem: ptr not kept in register

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 26, 2014 at 12:36 AM, Peter A. Felvegi
<petschy@xxxxxxxxxxxxxxxxxx> wrote:
> On 03/26/2014 12:32 AM, Ian Lance Taylor wrote:
>>
>> On Tue, Mar 25, 2014 at 12:38 PM, Peter A. Felvegi
>> <petschy@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> The reduced test case is at the end. It encodes data into a buffer in a
>>> loop
>>> with variable length encoding (not a working real encoding). For some
>>> reason, the write ptr is not kept in a register, but loaded/stored when
>>> used/updated. There is a potential function call in the loop, but there
>>> are
>>> __builtin_expect hints, so I think it would be possible to use a register
>>> for the ptr and store just before the call, and load it back right after
>>> the
>>> call. This would speed up the common code path: less code, less loads and
>>> stores.I measured around 20-30% more runtime, compared to a version where
>>> a
>>> pointer goes in and the updated ptr is returned. However,
>>> passing/returning
>>> the ptr has other issues, esp for a decoder, that would return the
>>> decoded
>>> value normally, not the ptr.
>>
>> You marked the encode_noinline function as noinline, and encode can
>> call encode_noinline.  The encode_noinline function could change any
>> part of global memory, and in particular could change the value of
>> n->next.  So the loop has to reload that value, in case it was
>> changed.
>
> I think you misunderstood. The node ptr is in %rbx, which is callee saved.
> After writing the data to cur ptr at +38 (mov %esi, (%rax)), n=n->next is
> performed at +40: mov    0x8(%rbx),%rbx. This is as simple as it gets, there
> are no reloads.
>
> My point was that the out buffer's cur ptr gets loaded/stored all the time,
> even stored more than once in succession on certain paths. Yes,
> encode_noinline() could, and actually, will modify the cur ptr. But that
> call is on a marked unlikely path, while the likely path doesn't contain any
> calls, so could work entirely with registers. The loading/storing of cur on
> the likely path is a pessimization that affects performance.
>
> I hope this clarifies it. Is it then an optimizer issue?

I see what you mean.  You want the compiler to pull the value out of
memory for the likely loop and then store it back into memory for the
unlikely case.  That seems possible.  My first thought is that that
would be a moderately costly optimization that would very rarely pay
off, but I could be wrong.

Ian




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux