Hi, > > I want: > > > > 1. use the code which I posted as an inline, but I don't know how to inline > > it correctly > > There is a complete worked example just here, that Ian posted, with almost > exactly the same arguments. Can't you use it as a model? Could you please modify it for me? I tryed myself, but I do something wrong. I can benchmark the code on the real 68060 (GCC inline vs code posted by me) with FFmpeg, so this will be a real life test. > > 2. if the code is faster already with FFmpeg as an external link object, it > > will be even faster when inlined, so maybe default GCC code should be repleaced with code I > > posted? > > Sure, but first we need to know why the code you posted is faster. I can't > immediately see why it should be. > > Maybe it's something to do with sign extension. It's possible that gcc doesn't > use the umul_ppmm inline that Ian posted because you sign extend both args > before multiplying. > > This can easily be worked around, but we need to see the code gcc generates. #include <stdint.h> inline int MULH(int a, int b){ return ((int64_t)(a) * (int64_t)(b))>>32; } Here is asm output (GCC 4.4.0): #NO_APP .text .even .globl _MULH _MULH: move.l d3,-(sp) move.l d2,-(sp) move.l 12(sp),d1 smi d0 extb.l d0 move.l 16(sp),d3 smi d2 extb.l d2 move.l d2,a0 move.l d3,a1 move.l a1,-(sp) move.l a0,-(sp) move.l d1,-(sp) move.l d0,-(sp) jsr ___muldi3 lea (16,sp),sp move.l d0,d1 smi d0 extb.l d0 move.l d1,d0 move.l (sp)+,d2 move.l (sp)+,d3 rts Regards