Re: Building for an SH target without FPU

Sébastien Michelland <sebastien.mld@xxxxxxxxxxxxxx> · Sat, 9 Mar 2019 09:54:23 +0100

I believe in this case the default should be "-m4-nofpu".  But not
sure, better check that.  E.g. compile & link a test program (that uses
FPU-something) for the default target and for "-m4-nofpu".  Then just
compare the final ELF files.

Well I tried that first and it turned out not. Here's a MWE:

  #include <stdarg.h>

  void f(int x, ...)
  {
      va_list args;
      va_start(args, x);
      va_end(args);
  }

Now build it with two sets of options:

  % sh4eb-elf-gcc -O3 -c mwe.c -o mwe-default.o
  % sh4eb-elf-gcc -O3 -c mwe.c -o mwe-m4-nofpu.o -m4-nofpu

The first one is honestly a bit of a mess given the level of 
optimization and the lack of any useful code. Note the fmov instructions:

00000000 <_f>:
   0:	7f bc       	add	#-68,r15
   2:	61 f3       	mov	r15,r1
   4:	e2 f8       	mov	#-8,r2
   6:	71 04       	add	#4,r1
   8:	21 29       	and	r2,r1
   a:	62 13       	mov	r1,r2
   c:	72 18       	add	#24,r2
   e:	11 58       	mov.l	r5,@(32,r1)
  10:	72 04       	add	#4,r2
  12:	11 69       	mov.l	r6,@(36,r1)
  14:	63 13       	mov	r1,r3
  16:	11 7a       	mov.l	r7,@(40,r1)
  18:	73 20       	add	#32,r3
  1a:	f2 ba       	fmov	fr11,@r2      ; here
  1c:	f2 ab       	fmov	fr10,@-r2     ; here
  1e:	62 13       	mov	r1,r2
  20:	72 10       	add	#16,r2
  22:	72 04       	add	#4,r2
  24:	f2 9a       	fmov	fr9,@r2       ; here
  26:	f2 8b       	fmov	fr8,@-r2      ; here
  28:	62 13       	mov	r1,r2
  2a:	72 08       	add	#8,r2
  2c:	72 04       	add	#4,r2
  2e:	f2 7a       	fmov	fr7,@r2       ; here
  30:	71 04       	add	#4,r1
  32:	f2 6b       	fmov	fr6,@-r2      ; here
  34:	f1 5a       	fmov	fr5,@r1       ; here
  36:	62 f3       	mov	r15,r2
  38:	f1 4b       	fmov	fr4,@-r1      ; here
  3a:	72 30       	add	#48,r2
  3c:	12 12       	mov.l	r1,@(8,r2)
  3e:	71 2c       	add	#44,r1
  40:	12 11       	mov.l	r1,@(4,r2)
  42:	e1 44       	mov	#68,r1
  44:	12 33       	mov.l	r3,@(12,r2)
  46:	31 fc       	add	r15,r1
  48:	22 32       	mov.l	r3,@r2
  4a:	12 14       	mov.l	r1,@(16,r2)
  4c:	00 0b       	rts	
  4e:	7f 44       	add	#68,r15

But the second one is clean and FPU-free.

00000000 <_f>:
   0:	2f 76       	mov.l	r7,@-r15
   2:	e1 04       	mov	#4,r1
   4:	2f 66       	mov.l	r6,@-r15
   6:	2f 56       	mov.l	r5,@-r15
   8:	7f fc       	add	#-4,r15
   a:	31 fc       	add	r15,r1
   c:	2f 12       	mov.l	r1,@r15
   e:	7f 04       	add	#4,r15
  10:	00 0b       	rts	
  12:	7f 0c       	add	#12,r15

The exact same code is produced regardless of optimization level.

So I guess I'll stick to -m4-nofpu. Maybe --with-multilib-list is only 
used to add more architectures besides the default of the target?

For binutils, it will support all targets in the binutils (objdump etc.),
but not in ld or gas (I don't know about gdb).  (You may also need
--enable-64-bit-bfd).  This is quick to build and not too huge.

For GCC it only means to support all targets the backend you are building
supports; so all SuperH variants in your case.

Does that mean I could build a compiler that covers both -m3 and 
-m4-nofpu ? This would be useful because I currently use both.

Sébastien