How to inline a huge m68k code?

ami_stuff <ami_stuff@xxxxx> · Sun, 12 Jul 2009 23:30:19 +0200

Hi,

I want to ask for help with a huge m68k code which I try to inline.

This is optimized 32x32->64 function for 68060 processor (68020/030/04 CPUs have
this function in the hardware).

I repleaced C function from FFmpeg's source code with linkable object generated
form asm code and overall speedup of FFmpeg is already about 5-6% (when converting mp3 to wav)
on the real 68060 CPU:

av_always_inline int MULH(int a, int b){
return ((int64_t)(a) * (int64_t)(b))>>32;
} 

Here is m68k asm code:

MULH -- signed 32 by 32 bit multiply with 64 bit result.

result = MULH(arg1,arg2);
D0:D1           D0 D1

Returns the signed 64 bit result of multiplying arg1 by arg2.
arg1, arg2 - numbers to multiply
result - the signed 64 bit result of multiplying arg1 by arg2. 

XDEF _MULH

_MULH:
move.l d6,-(sp)
move.l d5,-(sp)
move.l d0,d6
beq.b return0
tst.l d1
beq.b return0
move.l d4,-(sp)
move.l d3,-(sp)
move.l d2,-(sp)
eor.l d1,d6
move.l d0,d2
bpl.b pos0
neg.l d2
neg.l d0
pos0:
move.l d1,d4
bpl.b pos1
neg.l d4
neg.l d1
pos1:
moveq #16,d5
move.l d0,d3
ror.l d5,d4
ror.l d5,d3
mulu.w d1,d0
mulu.w d3,d1
mulu.w d4,d2
mulu.w d4,d3
ror.l d5,d0
clr.l d4
add.w d1,d0
addx.l d4,d3
add.w d2,d0
addx.l d4,d3
lsr.l d5,d1
lsr.l d5,d2
add.l d3,d1
ror.l d5,d0
add.l d2,d1
tst.l d6
bpl.b xit
neg.l d0
negx.l d1
xit:
move.l (sp)+,d2
move.l (sp)+,d3
move.l (sp)+,d4
move.l (sp)+,d5
move.l (sp)+,d6
move.l d1,d0  ; line added by me: upper 32bit of the result go to d0
rts
return0:
clr.l d0
clr.l d1
movem.l (sp)+,d5/d6
rts 

Maybe this code can be also used with longlong.h file for 68060 CPU?
If so, please note that I added asm line to move upper 32bit result
from d1 to d0 and this line should be removed.

Thanks for any help!

Regards