On Wed, Dec 20, 2000 at 02:34:19AM -0600, Federico Mena Quintero <federico@xxxxxxxxxxxxx> wrote: > Anyways, in libart and gdk-pixbuf we have code like this to composite > an RGBA image over an RGB pixel: > > dest[0] = r2 + ((tmp + (tmp >> 8) + 0x80) >> 8); Warning! Here is a formula I just came up with (about the same as above, actually, but without rounding errors): x = ((n<<8) + n + 257)>>16; It works over the full range (n = 0..65535; x = 0..256) and is always exact. However, this optimization is not as important as you might think, as gcc already uses exactly this technique, however, gcc uses a multiplication since gcc's formula has to work over the full unsigned int range ;) For n/255, gcc does this on x86; movl %ebx,%eax mull .LC0 ; = 0x80808081 movl %ebx,%eax sall $8,%eax While my formula boils down to: shrl $7,%edx leal 257(%ebx,%eax),%eax shrl $16,%eax In practise, gcc's code is faster if enough registers are available (p-ii/iii), and usually not slower. It is also correct over the full range. So think twice before starting to "optimize" this division. (And always remember to use UNSIGNED variables where applicable, since these are much faster). -- -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg@xxxxxxxxxxxxx |e| -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |