Re: How to inline a huge m68k code?

Andrew Haley <aph@xxxxxxxxxx> · Mon, 13 Jul 2009 13:52:36 +0100

ami_stuff wrote:
> Hi,
> 
>>> I want:
>>>
>>> 1. use the code which I posted as an inline, but I don't know how to inline
>>> it correctly
>> There is a complete worked example just here, that Ian posted, with almost
>> exactly the same arguments.  Can't you use it as a model?
> 
> Could you please modify it for me? I tryed myself, but I do something wrong.
> I can benchmark the code on the real 68060 (GCC inline vs code posted by me)
> with FFmpeg, so this will be a real life test.
> 
>>> 2. if the code is faster already with FFmpeg as an external link object, it
>>> will be even faster when inlined, so maybe default GCC code should be repleaced with code I
>>> posted?
>> Sure, but first we need to know why the code you posted is faster.  I can't
>> immediately see why it should be.
>>
>> Maybe it's something to do with sign extension.  It's possible that gcc doesn't
>> use the umul_ppmm inline that Ian posted because you sign extend both args
>> before multiplying.
>>
>> This can easily be worked around, but we need to see the code gcc generates.
> 
> #include <stdint.h>
> 
> inline int MULH(int a, int b){
>     return ((int64_t)(a) * (int64_t)(b))>>32;
> }
> 
> Here is asm output (GCC 4.4.0):
> 
> #NO_APP
> 	.text
> 	.even
> 	.globl	_MULH
> _MULH:
> 	move.l d3,-(sp)
> 	move.l d2,-(sp)
> 	move.l 12(sp),d1
> 	smi d0
> 	extb.l d0
> 	move.l 16(sp),d3
> 	smi d2
> 	extb.l d2
> 	move.l d2,a0
> 	move.l d3,a1
> 	move.l a1,-(sp)
> 	move.l a0,-(sp)
> 	move.l d1,-(sp)
> 	move.l d0,-(sp)
> 	jsr ___muldi3
> 	lea (16,sp),sp
> 	move.l d0,d1
> 	smi d0
> 	extb.l d0
> 	move.l d1,d0
> 	move.l (sp)+,d2
> 	move.l (sp)+,d3
> 	rts

Right, so the problem is that the unsigned inline code Ian posted is never
used.  Maybe using unsigned arithmetic would work better.  Try this:

#include <stdint.h>

inline int xxMULH(int a, int b)
{
  uint32_t au = a;
  uint32_t bu = b;

  uint64_t res = (uint64_t)au * (uint64_t)bu;
  uint32_t res2 = res >> 32;

  if (a < 0)
    res2 -= bu;
  if (b < 0)
    res2 -= au;

  return (int)res2;
}

Andrew.