Hi, I want to ask for help with a huge m68k code which I try to inline. This is optimized 32x32->64 function for 68060 processor (68020/030/04 CPUs have this function in the hardware). I repleaced C function from FFmpeg's source code with linkable object generated form asm code and overall speedup of FFmpeg is already about 5-6% (when converting mp3 to wav) on the real 68060 CPU: av_always_inline int MULH(int a, int b){ return ((int64_t)(a) * (int64_t)(b))>>32; } Here is m68k asm code: MULH -- signed 32 by 32 bit multiply with 64 bit result. result = MULH(arg1,arg2); D0:D1 D0 D1 Returns the signed 64 bit result of multiplying arg1 by arg2. arg1, arg2 - numbers to multiply result - the signed 64 bit result of multiplying arg1 by arg2. XDEF _MULH _MULH: move.l d6,-(sp) move.l d5,-(sp) move.l d0,d6 beq.b return0 tst.l d1 beq.b return0 move.l d4,-(sp) move.l d3,-(sp) move.l d2,-(sp) eor.l d1,d6 move.l d0,d2 bpl.b pos0 neg.l d2 neg.l d0 pos0: move.l d1,d4 bpl.b pos1 neg.l d4 neg.l d1 pos1: moveq #16,d5 move.l d0,d3 ror.l d5,d4 ror.l d5,d3 mulu.w d1,d0 mulu.w d3,d1 mulu.w d4,d2 mulu.w d4,d3 ror.l d5,d0 clr.l d4 add.w d1,d0 addx.l d4,d3 add.w d2,d0 addx.l d4,d3 lsr.l d5,d1 lsr.l d5,d2 add.l d3,d1 ror.l d5,d0 add.l d2,d1 tst.l d6 bpl.b xit neg.l d0 negx.l d1 xit: move.l (sp)+,d2 move.l (sp)+,d3 move.l (sp)+,d4 move.l (sp)+,d5 move.l (sp)+,d6 move.l d1,d0 ; line added by me: upper 32bit of the result go to d0 rts return0: clr.l d0 clr.l d1 movem.l (sp)+,d5/d6 rts Maybe this code can be also used with longlong.h file for 68060 CPU? If so, please note that I added asm line to move upper 32bit result from d1 to d0 and this line should be removed. Thanks for any help! Regards