Am 30.01.2017 um 21:48 schrieb Johannes Schindelin:
So I tried to verify that Visual C optimizes this well, and oh my deity, this was not easy. In Debug mode, it does not optimize, i.e. the memcpy() will be called, even for simple 32-bit integers. In Release mode, Visual Studio's defaults turn on "whole-program optimization" which means that the only swapping that is going on in the end is that the meaning of two registers is swapped, i.e. the SWAP() macro is expanded to... no assembler code at all. After trying this and that and something else, I finally ended up with the file-scope optimized SWAP() resulting in this assembly code: 00007FF791311000 mov eax,dword ptr [rcx] 00007FF791311002 mov r8d,dword ptr [rdx] 00007FF791311005 mov dword ptr [rcx],r8d 00007FF791311008 mov dword ptr [rdx],eax
This looks good.
However, recent events (including some discussions on this mailing list which had unfortunate consequences) made me trust my instincts more. And my instincts tell me that it would be unwise to replace code that swaps primitive types in the straight-forward way by code that relies on advanced compiler optimization to generate efficient code, otherwise producing very suboptimal code.
I don't know how difficult it was to arrive at the result above, but I wouldn't call inlining memcpy(3) an advanced optimization (anymore?), since the major compilers seem to be doing that.
The SWAP in prio-queue.c seems to be the one with the biggest performance impact. Or perhaps it's the one in lookup_object()? The former is easier to measure, though.
Here's what I get with CFLAGS="-builtin -O2" (best of five): $ time ./t/helper/test-prio-queue $(seq 100000 -1 1) dump >/dev/null real 0m0.142s user 0m0.120s sys 0m0.020s And this is with CFLAGS="-no-builtin -O2": $ time ./t/helper/test-prio-queue $(seq 100000 -1 1) dump >/dev/null real 0m0.170s user 0m0.156s sys 0m0.012s Hmm. Not nice, but also not prohibitively slow.
The commit you quoted embarrasses me, and I have no excuse for it. I would love to see that myswap() ugliness fixed by replacing it with a construct that is simpler, and generates good code even without any smart compiler.
I don't see a way to do that without adding a type parameter. René