Re: How to inline a huge m68k code?

ami_stuff <ami_stuff@xxxxx> · Mon, 13 Jul 2009 14:07:21 +0200

Hi,

> > I want:
> > 
> > 1. use the code which I posted as an inline, but I don't know how to inline
> > it correctly
> 
> There is a complete worked example just here, that Ian posted, with almost
> exactly the same arguments.  Can't you use it as a model?

Could you please modify it for me? I tryed myself, but I do something wrong.
I can benchmark the code on the real 68060 (GCC inline vs code posted by me)
with FFmpeg, so this will be a real life test.

> > 2. if the code is faster already with FFmpeg as an external link object, it
> > will be even faster when inlined, so maybe default GCC code should be repleaced with code I
> > posted?
> 
> Sure, but first we need to know why the code you posted is faster.  I can't
> immediately see why it should be.
> 
> Maybe it's something to do with sign extension.  It's possible that gcc doesn't
> use the umul_ppmm inline that Ian posted because you sign extend both args
> before multiplying.
> 
> This can easily be worked around, but we need to see the code gcc generates.

#include <stdint.h>

inline int MULH(int a, int b){
    return ((int64_t)(a) * (int64_t)(b))>>32;
}

Here is asm output (GCC 4.4.0):

#NO_APP
	.text
	.even
	.globl	_MULH
_MULH:
	move.l d3,-(sp)
	move.l d2,-(sp)
	move.l 12(sp),d1
	smi d0
	extb.l d0
	move.l 16(sp),d3
	smi d2
	extb.l d2
	move.l d2,a0
	move.l d3,a1
	move.l a1,-(sp)
	move.l a0,-(sp)
	move.l d1,-(sp)
	move.l d0,-(sp)
	jsr ___muldi3
	lea (16,sp),sp
	move.l d0,d1
	smi d0
	extb.l d0
	move.l d1,d0
	move.l (sp)+,d2
	move.l (sp)+,d3
	rts

Regards