ami_stuff wrote: >> I don't understand what your actual question is. What do you want gcc >> to do differently? >> >> gcc's longlong.h already has inlined assembly code for 32x32->64 >> multiplication. For the 68060 it looks like this: >> >> #define umul_ppmm(xh, xl, a, b) \ >> __asm__ ("| Inlined umul_ppmm\n" \ >> " move%.l %2,%/d0\n" \ >> " move%.l %3,%/d1\n" \ >> " move%.l %/d0,%/d2\n" \ >> " swap %/d0\n" \ >> " move%.l %/d1,%/d3\n" \ >> " swap %/d1\n" \ >> " move%.w %/d2,%/d4\n" \ >> " mulu %/d3,%/d4\n" \ >> " mulu %/d1,%/d2\n" \ >> " mulu %/d0,%/d3\n" \ >> " mulu %/d0,%/d1\n" \ >> " move%.l %/d4,%/d0\n" \ >> " eor%.w %/d0,%/d0\n" \ >> " swap %/d0\n" \ >> " add%.l %/d0,%/d2\n" \ >> " add%.l %/d3,%/d2\n" \ >> " jcc 1f\n" \ >> " add%.l %#65536,%/d1\n" \ >> "1: swap %/d2\n" \ >> " moveq %#0,%/d0\n" \ >> " move%.w %/d2,%/d0\n" \ >> " move%.w %/d4,%/d2\n" \ >> " move%.l %/d2,%1\n" \ >> " add%.l %/d1,%/d0\n" \ >> " move%.l %/d0,%0" \ >> : "=g" ((USItype) (xh)), \ >> "=g" ((USItype) (xl)) \ >> : "g" ((USItype) (a)), \ >> "g" ((USItype) (b)) \ >> : "d0", "d1", "d2", "d3", "d4") >> > > But this code is slow compared to code I posted. It looks much the same to me, four MULUs and a bunch of carry propagation. What would you like to change? > FFmpeg linked with object generated from asm code I posted is 5% faster > (mp3 -> wav) on the real 68060 compared to default GCC asm code from > longlong.h. > > This audio decoder uses MULH: > > http://gnunet.org/libextractor/doxygen/html/mpegaudiodec_8c-source.html > > I want: > > 1. use the code which I posted as an inline, but I don't know how to inline > it correctly There is a complete worked example just here, that Ian posted, with almost exactly the same arguments. Can't you use it as a model? > 2. if the code is faster already with FFmpeg as an external link object, it > will be even faster when inlined, so maybe default GCC code should be repleaced with code I > posted? Sure, but first we need to know why the code you posted is faster. I can't immediately see why it should be. Maybe it's something to do with sign extension. It's possible that gcc doesn't use the umul_ppmm inline that Ian posted because you sign extend both args before multiplying. This can easily be worked around, but we need to see the code gcc generates. Andrew.