Hi, > I don't understand what your actual question is. What do you want gcc > to do differently? > > gcc's longlong.h already has inlined assembly code for 32x32->64 > multiplication. For the 68060 it looks like this: > > #define umul_ppmm(xh, xl, a, b) \ > __asm__ ("| Inlined umul_ppmm\n" \ > " move%.l %2,%/d0\n" \ > " move%.l %3,%/d1\n" \ > " move%.l %/d0,%/d2\n" \ > " swap %/d0\n" \ > " move%.l %/d1,%/d3\n" \ > " swap %/d1\n" \ > " move%.w %/d2,%/d4\n" \ > " mulu %/d3,%/d4\n" \ > " mulu %/d1,%/d2\n" \ > " mulu %/d0,%/d3\n" \ > " mulu %/d0,%/d1\n" \ > " move%.l %/d4,%/d0\n" \ > " eor%.w %/d0,%/d0\n" \ > " swap %/d0\n" \ > " add%.l %/d0,%/d2\n" \ > " add%.l %/d3,%/d2\n" \ > " jcc 1f\n" \ > " add%.l %#65536,%/d1\n" \ > "1: swap %/d2\n" \ > " moveq %#0,%/d0\n" \ > " move%.w %/d2,%/d0\n" \ > " move%.w %/d4,%/d2\n" \ > " move%.l %/d2,%1\n" \ > " add%.l %/d1,%/d0\n" \ > " move%.l %/d0,%0" \ > : "=g" ((USItype) (xh)), \ > "=g" ((USItype) (xl)) \ > : "g" ((USItype) (a)), \ > "g" ((USItype) (b)) \ > : "d0", "d1", "d2", "d3", "d4") > But this code is slow compared to code I posted. FFmpeg linked with object generated from asm code I posted is 5% faster (mp3 -> wav) on the real 68060 compared to default GCC asm code from longlong.h. This audio decoder uses MULH: http://gnunet.org/libextractor/doxygen/html/mpegaudiodec_8c-source.html I want: 1. use the code which I posted as an inline, but I don't know how to inline it correctly 2. if the code is faster already with FFmpeg as an external link object, it will be even faster when inlined, so maybe default GCC code should be repleaced with code I posted? Regards