Re: divide by 255

Marc Lehmann <pcg@xxxxxxxx> · Wed, 20 Dec 2000 10:54:16 +0100

On Wed, Dec 20, 2000 at 02:34:19AM -0600, Federico Mena Quintero <federico@xxxxxxxxxxxxx> wrote:
> Anyways, in libart and gdk-pixbuf we have code like this to composite
> an RGBA image over an RGB pixel:
> 
> 		dest[0] = r2 + ((tmp + (tmp >> 8) + 0x80) >> 8);

Warning!

Here is a formula I just came up with (about the same as above, actually,
but without rounding errors):

x = ((n<<8) + n + 257)>>16;

It works over the full range (n = 0..65535; x = 0..256) and is always
exact.

However, this optimization is not as important as you might think, as gcc
already uses exactly this technique, however, gcc uses a multiplication
since gcc's formula has to work over the full unsigned int range ;)

For n/255, gcc does this on x86;

   movl %ebx,%eax
   mull .LC0 ; = 0x80808081
   movl %ebx,%eax
   sall $8,%eax

While my formula boils down to:

   shrl $7,%edx
   leal 257(%ebx,%eax),%eax
   shrl $16,%eax

In practise, gcc's code is faster if enough registers are available
(p-ii/iii), and usually not slower. It is also correct over the full
range.

So think twice before starting to "optimize" this division.

(And always remember to use UNSIGNED variables where applicable, since
these are much faster).

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@xxxxxxxxxxxxx |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |