Am 05 Sep 2001 13:49:17 +0200 schrieb Sven Neumann: > you need to do some masking and shifting. Well, you asked for it, here's > the code we use in DirectFB to blend 'color' with opacity 'a' to destination > pixel 'd' (RGB32). This code handles the red and blue channels in one > multiplication. The operation you asked for could be handled in a similar > fashion with two multiplications instead of four. Of course such > optimizations need to be properly benchmarked to see if they make sense. > __u32 __rb = (((color.r)<<16) | (color.b)); > __u32 __g = ((color.g)<<8); > > switch (a) {\ > case 0xff: *(d) = (0xff000000 | __rb | __g); \ > case 0: break; \ > default: {\ > __u32 pixel = *(d);\ > __u16 s = (a)+1;\ > register __u32 t1,t2; \ > t1 = (pixel&0x00ff00ff); t2 = (pixel&0x0000ff00); \ > pixel = ((((__rb-t1)*s+(t1<<8)) & 0xff00ff00) + \ > ((( __g-t2)*s+(t2<<8)) & 0x00ff0000)) >> 8; \ > *(d) = pixel;\ > }\ > } Okay, so you work on 2 pixels concurrently by using corresping masks. However this code isn't saturating but calculates modulo because of the 0xff masks. It might be slightly faster on some architectures but I really doubt it is in general because this adds quite some overhead compared to the simple case. It's not exactly what I had in mind, I though more of a: load 2*32bit do calculations on the two corresponding data bytes of the word currently with saturation/modulo (with a fast single asm mnemonic). It is possible to do that with MMX/Altivec/other SIMD CPU but that needs special code for every CPU. Servus, Daniel