I just completed a quick and dirty benchmark comparing the use of arithmetic branching using c99/gcc intrinsics within the yafr sampler code, to using the standard c if then else. These tests were performed on a Thinkpad t60p with Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz with 2025MiB memory running 2.6.24-19-generic #1 SMP by way of a pretty standard Ubuntu 8.04. Warning: There seems to be something wrong with math.h with the current version of gcc, as suggested by some recent bug postings. For example, according to the gcc documentation, I should not have to prefix fminf with __builtin_. Consequently, it could be that the benchmark results will soon be made irrelevant. Second warning: If my memory is good, Intel chips have a good and fast implementation of the "? :" branching construct (having to do with selecting which register to copy into another), as well as good branch prediction. My code without intrinsics is structured to take advantage of this. Third warning: I have not optimized looking at the assembler output of gcc, and have done no optimization of the "arithmetic branching" version of the code. In particular, I have not used fmaf, even though my code is peppered with opportunity to use it (this may not be a big deal: apparently, gcc attempts to spot opportunities to use fused multiply-add). ------------------------------ quick description of the test: ------------------------------ I ran a bunch of consecutive scalings (times 20) of a digital photograph with initial dimensions 200x133, driving the gegl scale through an xml file analogous to the ones in gegl/docs/gallery, alternating between the "with branching" and "arithmetic branching with intrinsics" versions, and throwing in four scalings with the gegl stock linear. ------------------------------------------------- Differences between the two versions of the code: ------------------------------------------------- 16 code segments resembling the following (note the ?: this the version with branching): const gfloat prem_squared = prem * prem_; const gfloat deux_squared = deux * deux_; const gfloat troi_squared = troi * troi_; const gfloat prem_times_deux = prem * deux; const gfloat deux_times_troi = deux * troi; const gfloat deux_squared_minus_prem_squared = deux_squared - prem_squared; const gfloat troi_squared_minus_deux_squared = troi_squared - deux_squared; const gfloat prem_vs_deux = deux_squared_minus_prem_squared > (gfloat) 0. ? prem : deux; const gfloat deux_vs_troi= troi_squared_minus_deux_squared > (gfloat) 0. ? deux: troi; const gfloat my__up = prem_times_deux > (gfloat) 0. ? prem_vs_deux : (gfloat) 0.; const gfloat my_dow = deux_times_troi> (gfloat) 0. ? deux_vs_troi : (gfloat) 0.; were replaced by (this is the version with arithmetic branching): const gfloat abs_prem = fabsf( prem ); const gfloat abs_deux = fabsf( deux ); const gfloat abs_troi = fabsf( troi ); const gfloat prem_vs_deux = __builtin_fminf( abs_prem, abs_deux ); const gfloat deux_vs_troi = __builtin_fminf( abs_deux, abs_troi ); const gfloat sign_prem = copysignf( prem, (gfloat) 1. ); const gfloat sign_deux = copysignf( deux, (gfloat) 1. ); const gfloat sign_troi = copysignf( troi, (gfloat) 1. ); const gfloat my__up = ( sign_prem * sign_deux + (gfloat) 1. ) * prem_vs_deux; const gfloat my_dow = ( sign_deux * sign_troi + (gfloat) 1. ) * prem_deux_0_vs_troi; Basically, what the code snippets does is this: If prem and deux have the same sign, put the smallest one (in absolute value) in my__up. Otherwise, set my__up to zero. Do likewise with deux, troi and my_dow. The above two code snippets represent the best ways of performing this that I could figure. =================== Overall conclusion: =================== Arithmetic branching (without other improvements) does not appear to be worth the trouble. ================ Average timings: ================ stock gegl linear scale: 47.50 = ( 47.474 + 47.581 + 47.345 + 47.595 ) / 4 gegl yafr with ? branching and no use of intrinsics: 52.58 = ( 52.422 + 52.479 + 52.748 + 52.501 + 52.680 + 52.623 + 52.537 + 52.518 + 52.576 + 52.487 + 52.542 + 52.485 + 52.645 + 52.810 + 52.667 + 52.554 ) / 16 gegl yafr performing arithmetic branching with fabsf, copysignf and fminf: 52.70 = ( 52.568 + 52.447 + 52.763 + 52.524 + 52.772 + 52.652 + 52.524 + 52.765 + 52.596 + 52.850 + 52.733 + 52.799 + 52.627 + 52.897 + 52.871 + 52.866 ) / 16 As you can see, the "?" version is slightly faster overall. Probably not in a significant way, but this certainly does not suggest that this is worth the hassle. Nicolas Robidoux Laurentian University/Universite Laurentienne _______________________________________________ Gegl-developer mailing list Gegl-developer@xxxxxxxxxxxxxxxxxxxxxx https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer