ami_stuff wrote: > Hi, > >>> I want: >>> >>> 1. use the code which I posted as an inline, but I don't know how to inline >>> it correctly >> There is a complete worked example just here, that Ian posted, with almost >> exactly the same arguments. Can't you use it as a model? > > Could you please modify it for me? I tryed myself, but I do something wrong. > I can benchmark the code on the real 68060 (GCC inline vs code posted by me) > with FFmpeg, so this will be a real life test. > >>> 2. if the code is faster already with FFmpeg as an external link object, it >>> will be even faster when inlined, so maybe default GCC code should be repleaced with code I >>> posted? >> Sure, but first we need to know why the code you posted is faster. I can't >> immediately see why it should be. >> >> Maybe it's something to do with sign extension. It's possible that gcc doesn't >> use the umul_ppmm inline that Ian posted because you sign extend both args >> before multiplying. >> >> This can easily be worked around, but we need to see the code gcc generates. > > #include <stdint.h> > > inline int MULH(int a, int b){ > return ((int64_t)(a) * (int64_t)(b))>>32; > } > > Here is asm output (GCC 4.4.0): > > #NO_APP > .text > .even > .globl _MULH > _MULH: > move.l d3,-(sp) > move.l d2,-(sp) > move.l 12(sp),d1 > smi d0 > extb.l d0 > move.l 16(sp),d3 > smi d2 > extb.l d2 > move.l d2,a0 > move.l d3,a1 > move.l a1,-(sp) > move.l a0,-(sp) > move.l d1,-(sp) > move.l d0,-(sp) > jsr ___muldi3 > lea (16,sp),sp > move.l d0,d1 > smi d0 > extb.l d0 > move.l d1,d0 > move.l (sp)+,d2 > move.l (sp)+,d3 > rts Right, so the problem is that the unsigned inline code Ian posted is never used. Maybe using unsigned arithmetic would work better. Try this: #include <stdint.h> inline int xxMULH(int a, int b) { uint32_t au = a; uint32_t bu = b; uint64_t res = (uint64_t)au * (uint64_t)bu; uint32_t res2 = res >> 32; if (a < 0) res2 -= bu; if (b < 0) res2 -= au; return (int)res2; } Andrew.