Re: Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

Nicolas Robidoux <nrobidoux@xxxxxxxxxxxxxxxx> · Sat, 13 Sep 2008 10:34:18 -0400 (EDT)

Short postscript about the benchmark:

I got worried that with a scaling factor of 20, since we recompute the exact same piexl coefficients about 400 times, the chip's branch prediction may be performing better than would be typical with more reasonable enlargement ratios.

So, I redid the tests with a scaling factor of 1.17 instead of 20.

This times, arithmetic branching using c99/gcc math intrinsics performed just a little better than using if-then-else, reversing the results of the previous test (the difference is not statistically significant).

In any case the overall conclusion seems to be: program with what you like best, at least with Intel chips (with the cell processor, say, things may be different).

================
Average timings:
================

stock gegl-sampler-linear scale:

.541 = ( .525 + .607 + .517 + .516 ) / 4

gegl-sampler-yafr with if-then-else branching and no use of intrinsics:

.558 =
( .548 + .544 + .567 + .548 + .549 + .546 + .570 + .545 + .614 + .544 + .559 
+ .549 + .567 + .545 + .565 + .570 ) / 16

gegl-sampler-yafr performing arithmetic branching with fabsf, copysignf and fminf:

.551 = 
( .565 + .550 + .550 + .546 + .550 + .549 + .546 + .567 + .566 + .545 + .548 
+ .550 + .549 + .548 + .546 + .550 ) / 16
------------------------------------------------------------------------

Also, there were some small typos in the code snippets I emailed (which were
hand edited version of the real code, hence the typos). Here are cleaned up 
versions. (The following is for reference, really.)

Note that there is a slightly different scaling in the two versions (.5 vs .25), 
scaling which is taken care of at no cost elsewhere in the real code.

Within each version of yafr, 16 code segments resembling the following 
(note the ?: this the version with branching):

  const gfloat prem_squared = prem * prem_;
  const gfloat deux_squared = deux * deux_;
  const gfloat troi_squared = troi * troi_;
  const gfloat prem_times_deux = prem * deux;
  const gfloat deux_times_troi = deux * troi;
  const gfloat deux_squared_minus_prem_squared = deux_squared - prem_squared;
  const gfloat troi_squared_minus_deux_squared = troi_squared - deux_squared;
  const gfloat prem_vs_deux =
    deux_squared_minus_prem_squared > (gfloat) 0. ? prem : deux;
  const gfloat deux_vs_troi=
    troi_squared_minus_deux_squared > (gfloat) 0. ? deux: troi;
  const gfloat my__up =
    prem_times_deux > (gfloat) 0. ? prem_vs_deux : (gfloat) 0.;
  const gfloat my_dow =
    deux_times_troi > (gfloat) 0. ? deux_vs_troi : (gfloat) 0.;

were replaced by (this is the version with arithmetic branching):

  const gfloat abs_prem = fabsf( prem );
  const gfloat abs_deux = fabsf( deux );
  const gfloat abs_troi = fabsf( troi );
  const gfloat prem_vs_deux = __builtin_fminf( abs_prem, abs_deux );
  const gfloat deux_vs_troi = __builtin_fminf( abs_deux, abs_troi );
  const gfloat sign_prem = copysignf( prem, (gfloat) 1. );
  const gfloat sign_deux = copysignf( deux, (gfloat) 1. );
  const gfloat sign_troi = copysignf( troi, (gfloat) 1. );
  const gfloat my__up =
    ( sign_prem * sign_deux + (gfloat) 1. ) * prem_vs_deux;
  const gfloat my_dow =
    ( sign_deux * sign_troi + (gfloat) 1. ) * deux_vs_troi;

Basically, what the code snippets does is this:

If prem and deux have the same sign, put the smallest one (in absolute
value) in my__up. Otherwise, set my__up to zero. Do likewise with
deux, troi and my_dow. The above two code snippets represent the best
ways of performing this that I could figure.

Nicolas Robidoux
Laurentian University/Universite Laurentienne
_______________________________________________
Gegl-developer mailing list
Gegl-developer@xxxxxxxxxxxxxxxxxxxxxx
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer