Re: [Gimp-developer] Solaris 64bit compile

Daniel Egger <egger@xxxxxxx> · 05 Sep 2001 16:56:31 +0200

Am 05 Sep 2001 13:49:17 +0200 schrieb Sven Neumann:

> you need to do some masking and shifting. Well, you asked for it, here's
> the code we use in DirectFB to blend 'color' with opacity 'a' to destination
> pixel 'd' (RGB32). This code handles the red and blue channels in one
> multiplication. The operation you asked for could be handled in a similar
> fashion with two multiplications instead of four. Of course such 
> optimizations need to be properly benchmarked to see if they make sense.

>      __u32 __rb = (((color.r)<<16) | (color.b));
>      __u32 __g  =  ((color.g)<<8);
> 
>      switch (a) {\
>      case 0xff: *(d) = (0xff000000 | __rb | __g); \
>      case 0: break; \
>      default: {\
> 	  __u32 pixel = *(d);\
> 	  __u16  s = (a)+1;\
> 	  register __u32 t1,t2; \
> 	  t1 = (pixel&0x00ff00ff); t2 = (pixel&0x0000ff00); \
> 	  pixel = ((((__rb-t1)*s+(t1<<8)) & 0xff00ff00) + \
> 	           ((( __g-t2)*s+(t2<<8)) & 0x00ff0000)) >> 8; \
> 	  *(d) = pixel;\
>           }\
>      }

Okay, so you work on 2 pixels concurrently by using corresping masks.
However this code isn't saturating but calculates modulo because of the
0xff masks. It might be slightly faster on some architectures but I
really doubt it is in general because this adds quite some overhead 
compared to the simple case. It's not exactly what I had in mind, I
though more of a:
load 2*32bit
do calculations on the two corresponding data bytes of the word
currently with saturation/modulo (with a fast single asm mnemonic).

It is possible to do that with MMX/Altivec/other SIMD CPU but that
needs special code for every CPU.

Servus,
       Daniel