Re: How to inline a huge m68k code?

ami_stuff <ami_stuff@xxxxx> · Mon, 13 Jul 2009 12:43:49 +0200

Hi,

> I don't understand what your actual question is.  What do you want gcc
> to do differently?
> 
> gcc's longlong.h already has inlined assembly code for 32x32->64
> multiplication.  For the 68060 it looks like this:
> 
> #define umul_ppmm(xh, xl, a, b) \
>   __asm__ ("| Inlined umul_ppmm\n"					\
> 	   "	move%.l	%2,%/d0\n"					\
> 	   "	move%.l	%3,%/d1\n"					\
> 	   "	move%.l	%/d0,%/d2\n"					\
> 	   "	swap	%/d0\n"						\
> 	   "	move%.l	%/d1,%/d3\n"					\
> 	   "	swap	%/d1\n"						\
> 	   "	move%.w	%/d2,%/d4\n"					\
> 	   "	mulu	%/d3,%/d4\n"					\
> 	   "	mulu	%/d1,%/d2\n"					\
> 	   "	mulu	%/d0,%/d3\n"					\
> 	   "	mulu	%/d0,%/d1\n"					\
> 	   "	move%.l	%/d4,%/d0\n"					\
> 	   "	eor%.w	%/d0,%/d0\n"					\
> 	   "	swap	%/d0\n"						\
> 	   "	add%.l	%/d0,%/d2\n"					\
> 	   "	add%.l	%/d3,%/d2\n"					\
> 	   "	jcc	1f\n"						\
> 	   "	add%.l	%#65536,%/d1\n"					\
> 	   "1:	swap	%/d2\n"						\
> 	   "	moveq	%#0,%/d0\n"					\
> 	   "	move%.w	%/d2,%/d0\n"					\
> 	   "	move%.w	%/d4,%/d2\n"					\
> 	   "	move%.l	%/d2,%1\n"					\
> 	   "	add%.l	%/d1,%/d0\n"					\
> 	   "	move%.l	%/d0,%0"					\
> 	   : "=g" ((USItype) (xh)),					\
> 	     "=g" ((USItype) (xl))					\
> 	   : "g" ((USItype) (a)),					\
> 	     "g" ((USItype) (b))					\
> 	   : "d0", "d1", "d2", "d3", "d4")
> 

But this code is slow compared to code I posted.
FFmpeg linked with object generated from asm code I posted is 5% faster
(mp3 -> wav) on the real 68060 compared to default GCC asm code from
longlong.h.

This audio decoder uses MULH:

http://gnunet.org/libextractor/doxygen/html/mpegaudiodec_8c-source.html

I want:

1. use the code which I posted as an inline, but I don't know how to inline
it correctly

2. if the code is faster already with FFmpeg as an external link object, it
will be even faster when inlined, so maybe default GCC code should be repleaced with code I
posted?

Regards