On 07/09/2010 09:11 AM, Mathieu Lacage wrote: > The attached C++ testcase compares the performance behavior of > __int128_t used directly vs __int128_t used through an overloaded > operator <. The overloaded < operator appears faster than the raw > __int128_t which I find really surprising so, I fear I am not > measuring what I think I am measuring. Hints ? > > [mathieu@mathieu-laptop benchmark-time]$ g++ --version > g++ (GCC) 4.4.3 20100127 (Red Hat 4.4.3-4) > [mathieu@mathieu-laptop benchmark-time]$ g++ -O3 test.cc > # run raw __int128_t version > [mathieu@mathieu-laptop benchmark-time]$ time -p ./a.out 100000002 a > 16384 > 2 > real 0.60 > user 0.60 > sys 0.00 > # run operator < version > [mathieu@mathieu-laptop benchmark-time]$ time -p ./a.out 100000002 test > 16384 > 2 > real 0.40 > user 0.40 > sys 0.00 g++ seems to be generating a specialization of run_cmp() in the __int128_t case, with the parameters a and b fixed at a=1 and b=2, in an attempt to do some constant propagation. This ought to help, but unfortunately the back-end generates worse code for the specialized case. This isn't uncommon in optimizing compilers: you do something that usually improves code quality, but occasionally makes things worse. If you compile with -fdump-tree-optimized you'll see what is happening. Andrew.