Re: [PATCH 1/4] Tile caching performance patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In fact, with -O2, gcc is generating more complex assembly for % than
&, though not an integer division. Assembly generated for version
using &:

tile_data_pointer:
.LFB29:
        movzwl  8(%rdi), %eax
        andl    $63, %edx
        andl    $63, %esi
        imull   %eax, %edx
        movzbl  7(%rdi), %eax
        addl    %esi, %edx
        imull   %eax, %edx
        movslq  %edx,%rax
        addq    24(%rdi), %rax
        ret

assembly generated for version using %:

tile_data_pointer:
.LFB29:
        movl    %edx, %eax
        sarl    $31, %eax
        shrl    $26, %eax
        addl    %eax, %edx
        andl    $63, %edx
        subl    %eax, %edx
        movzwl  8(%rdi), %eax
        imull   %eax, %edx
        movl    %esi, %eax
        sarl    $31, %eax
        shrl    $26, %eax
        addl    %eax, %esi
        andl    $63, %esi
        subl    %eax, %esi
        movzbl  7(%rdi), %eax
        addl    %esi, %edx
        imull   %eax, %edx
        movslq  %edx,%rax
        addq    24(%rdi), %rax
        ret


On Tue, Jun 2, 2009 at 5:09 PM, Sven Neumann <sven@xxxxxxxx> wrote:
> Hi,
>
> On Tue, 2009-06-02 at 16:56 -0400, Christopher Montgomery wrote:
>
>> > As far as I know pretty much any compiler out there should be able to
>> > replace a modulo by a power-of-2 constant by the bit-wise AND operation
>> > without us explicitly doing so (see also
>> > http://en.wikipedia.org/wiki/Modulo_operation#Performance_issues). So
>> > for the benefit of readable code I suggest that we keep the code as it
>> > is.
>>
>> Interesting.  I got a noticable and repeatable performance benefit.
>> Which is not to say I haven't somehow mismeasured it.  I agree the
>> modulo is more readable.
>>
>> ...perhaps the difference is the difference of (x) or (y) possibly
>> being negative and additional conformance-related assembly getting
>> generated? I suppose there's no reason to speculate, I'll go read the
>> assembly gcc generates and that will answer everything, at least for
>> me.
>
> I might very well be wrong here. If there's indeed a difference in the
> generated assembly and a noticeable performance benefit, than let's use
> the optimized macro. But perhaps we can add a short comment there
> explaining that ((y) & (TILE_HEIGHT-1)) is equivalent to ((y) %
> TILE_HEIGHT). Not everyone reading this code will be aware of this
> immediately.
>
>
> Sven
>
>
>
_______________________________________________
Gimp-developer mailing list
Gimp-developer@xxxxxxxxxxxxxxxxxxxxxx
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer


[Index of Archives]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [GIMP for Windows]     [KDE]     [GEGL]     [Gimp's Home]     [Gimp on GUI]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux