Re: Building for an SH target without FPU

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Sat, 9 Mar 2019 04:50:04 -0600

Hi!

On Sat, Mar 09, 2019 at 09:54:23AM +0100, Sébastien Michelland wrote:
> Well I tried that first and it turned out not. Here's a MWE:
> 
>   #include <stdarg.h>
> 
>   void f(int x, ...)
>   {
>       va_list args;
>       va_start(args, x);
>       va_end(args);
>   }
> 
> Now build it with two sets of options:
> 
>   % sh4eb-elf-gcc -O3 -c mwe.c -o mwe-default.o
>   % sh4eb-elf-gcc -O3 -c mwe.c -o mwe-m4-nofpu.o -m4-nofpu
> 
> The first one is honestly a bit of a mess given the level of 
> optimization and the lack of any useful code. Note the fmov instructions:
> 
> 00000000 <_f>:
>    0:	7f bc       	add	#-68,r15
>    2:	61 f3       	mov	r15,r1
>    4:	e2 f8       	mov	#-8,r2
>    6:	71 04       	add	#4,r1
>    8:	21 29       	and	r2,r1
>    a:	62 13       	mov	r1,r2
>    c:	72 18       	add	#24,r2
>    e:	11 58       	mov.l	r5,@(32,r1)
>   10:	72 04       	add	#4,r2
>   12:	11 69       	mov.l	r6,@(36,r1)
>   14:	63 13       	mov	r1,r3
>   16:	11 7a       	mov.l	r7,@(40,r1)
>   18:	73 20       	add	#32,r3
>   1a:	f2 ba       	fmov	fr11,@r2      ; here
>   1c:	f2 ab       	fmov	fr10,@-r2     ; here
>   1e:	62 13       	mov	r1,r2
>   20:	72 10       	add	#16,r2
>   22:	72 04       	add	#4,r2
>   24:	f2 9a       	fmov	fr9,@r2       ; here
>   26:	f2 8b       	fmov	fr8,@-r2      ; here
>   28:	62 13       	mov	r1,r2
>   2a:	72 08       	add	#8,r2
>   2c:	72 04       	add	#4,r2
>   2e:	f2 7a       	fmov	fr7,@r2       ; here
>   30:	71 04       	add	#4,r1
>   32:	f2 6b       	fmov	fr6,@-r2      ; here
>   34:	f1 5a       	fmov	fr5,@r1       ; here
>   36:	62 f3       	mov	r15,r2
>   38:	f1 4b       	fmov	fr4,@-r1      ; here
>   3a:	72 30       	add	#48,r2
>   3c:	12 12       	mov.l	r1,@(8,r2)
>   3e:	71 2c       	add	#44,r1
>   40:	12 11       	mov.l	r1,@(4,r2)
>   42:	e1 44       	mov	#68,r1
>   44:	12 33       	mov.l	r3,@(12,r2)
>   46:	31 fc       	add	r15,r1
>   48:	22 32       	mov.l	r3,@r2
>   4a:	12 14       	mov.l	r1,@(16,r2)
>   4c:	00 0b       	rts	
>   4e:	7f 44       	add	#68,r15
> 
> But the second one is clean and FPU-free.
> 
> 00000000 <_f>:
>    0:	2f 76       	mov.l	r7,@-r15
>    2:	e1 04       	mov	#4,r1
>    4:	2f 66       	mov.l	r6,@-r15
>    6:	2f 56       	mov.l	r5,@-r15
>    8:	7f fc       	add	#-4,r15
>    a:	31 fc       	add	r15,r1
>    c:	2f 12       	mov.l	r1,@r15
>    e:	7f 04       	add	#4,r15
>   10:	00 0b       	rts	
>   12:	7f 0c       	add	#12,r15

Ideally it should compile to just an rts instruction though?

> The exact same code is produced regardless of optimization level.
> 
> So I guess I'll stick to -m4-nofpu. Maybe --with-multilib-list is only 
> used to add more architectures besides the default of the target?
> 
> >For binutils, it will support all targets in the binutils (objdump etc.),
> >but not in ld or gas (I don't know about gdb).  (You may also need
> >--enable-64-bit-bfd).  This is quick to build and not too huge.
> >
> >For GCC it only means to support all targets the backend you are building
> >supports; so all SuperH variants in your case.
> 
> Does that mean I could build a compiler that covers both -m3 and 
> -m4-nofpu ? This would be useful because I currently use both.

I tried it out right now, and yes, if you configure with --enable-targets=all
all of -m3 and -m4 and -m4-nofpu seem to work fine (I only looked at trivial
code, there might be issues with multilibs, or other bugs.  You can also
say something like --enable-targets=sh3,sh4-nofpu but why would you, disk
is cheap, and it doesn't take very long to build either :-)

Segher