Re: [PATCH 1/5] add SWAP macro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 30.01.2017 um 21:48 schrieb Johannes Schindelin:
So I tried to verify that Visual C optimizes this well, and oh my deity,
this was not easy. In Debug mode, it does not optimize, i.e. the memcpy()
will be called, even for simple 32-bit integers. In Release mode, Visual
Studio's defaults turn on "whole-program optimization" which means that
the only swapping that is going on in the end is that the meaning of two
registers is swapped, i.e. the SWAP() macro is expanded to... no assembler
code at all.

After trying this and that and something else, I finally ended up with the
file-scope optimized SWAP() resulting in this assembly code:

00007FF791311000  mov         eax,dword ptr [rcx]
00007FF791311002  mov         r8d,dword ptr [rdx]
00007FF791311005  mov         dword ptr [rcx],r8d
00007FF791311008  mov         dword ptr [rdx],eax

This looks good.

However, recent events (including some discussions on this mailing list
which had unfortunate consequences) made me trust my instincts more. And
my instincts tell me that it would be unwise to replace code that swaps
primitive types in the straight-forward way by code that relies on
advanced compiler optimization to generate efficient code, otherwise
producing very suboptimal code.

I don't know how difficult it was to arrive at the result above, but I wouldn't call inlining memcpy(3) an advanced optimization (anymore?), since the major compilers seem to be doing that.

The SWAP in prio-queue.c seems to be the one with the biggest performance impact. Or perhaps it's the one in lookup_object()? The former is easier to measure, though.

Here's what I get with CFLAGS="-builtin -O2" (best of five):

$ time ./t/helper/test-prio-queue $(seq 100000 -1 1) dump >/dev/null

real    0m0.142s
user    0m0.120s
sys     0m0.020s

And this is with CFLAGS="-no-builtin -O2":

$ time ./t/helper/test-prio-queue $(seq 100000 -1 1) dump >/dev/null

real    0m0.170s
user    0m0.156s
sys     0m0.012s

Hmm.  Not nice, but also not prohibitively slow.

The commit you quoted embarrasses me, and I have no excuse for it. I would
love to see that myswap() ugliness fixed by replacing it with a construct
that is simpler, and generates good code even without any smart compiler.

I don't see a way to do that without adding a type parameter.

René



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]