Re: [PATCH for-3.18 4/5] MIPS: UAPI: Ignore __arch_swab{16,32,64} when using MIPS16

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 4 Jul 2017, Amit Pundir wrote:

> From: Yousong Zhou <yszhou4tech@xxxxxxxxx>
> 
> commit 71a0a72456b48de972d7ed613b06a22a3aa9057f upstream.
> 
> Some GCC versions (e.g. 4.8.3) can incorrectly inline a function with
> MIPS32 instructions into another function with MIPS16 code [1], causing
> the assembler to genereate incorrect binary code or fail right away
> complaining about unrecognized opcode.
> 
> In the case of __arch_swab{16,32}, when inlined by the compiler with
> flags `-mips32r2 -mips16 -Os', the assembler can fail with the following
> error.
> 
>     {standard input}:79: Error: unrecognized opcode `wsbh $2,$2'
> 
> For performance concerns and to workaround the issue already existing in
> older compilers, just ignore these 2 functions when compiling with
> mips16 enabled.
> 
>  [1] Inlining nomips16 function into mips16 function can result in
>      undefined builtins, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55777

 The patch is correct, however the description does not match reality.  
There is no GCC bug involved here as: "Some GCC versions (e.g. 4.8.3) can 
incorrectly inline a function [...]" would suggest, and GCC PR 
target/55777 has nothing to do with it.

 Here you have inline functions including an inline asm each with assembly 
instructions that do not have a MIPS16 representation.  Unlike with GCC PR 
target/55777 these functions are *not* marked with 
`__attribute__((nomips16))', so the compiler is free to inline them into 
any code, including MIPS16 code in particular, not being aware that the 
inline asm is incompatible with MIPS16 assembly.  It can't be aware as the 
compiler is not an assembler and it does not interpret the string 
representing assembly code to be produced from an inline asm (beyond 
counting assembly instruction separators).

 Marking these functions with `__attribute__((nomips16))' would instruct 
the compiler to compile these functions as well as any function they get 
inlined into as regular MIPS code, which may or may not be desirable (plus 
*then* you might hit GCC PR target/55777, and IIRC some other bugs in 
older compilers).  So excluding them from MIPS16 code instead seems like a 
reasonable choice.

 OTOH, generic MIPS16 byte-swap code produced is awful, especially 
`swab32' (and `swab64' does not apply as we don't support 64-bit MIPS16 
code under Linux), so perhaps for MIPS16 code these really ought to be 
`__attribute__((nomips16, noinline))' instead, avoiding turning the caller 
into regular MIPS code as well as any old `nomips16' function inlining 
bugs; although the benefit might outweigh the cost if one of these 
functions is called from an otherwise leaf function and spilling registers 
to the stack turns out necessary where it otherwise would not be.

 Finally, for regular MIPS compilations contemporary versions of GCC 
already produce the same assembly as our <uapi/asm/swab.h> provides, e.g. 
from:

$ cat swap.c
unsigned short int swap16(unsigned short int i)
{
	return i << 8 | i >> 8;
}

unsigned int swap32(unsigned int i)
{
	return i << 24 | (i & 0xff00) << 8 | (i & 0xff0000) >> 8 | i >> 24;
}

unsigned long long int swap64(unsigned long long int i)
{
	return (i << 56 | (i & 0xff00LL) << 40 |
		(i & 0xff0000LL) << 24 | (i & 0xff000000LL) << 8 |
		(i & 0xff00000000LL) >> 8 | (i & 0xff0000000000LL) >> 24 |
		(i & 0xff000000000000LL) >> 40 | i >> 56);
}
$ 

you get:

$ gcc -O2 -mabi=64 -march=mips64r2 -c swap.c
$ objdump -d swap.o

swap.o:     file format elf64-tradbigmips


Disassembly of section .text:

0000000000000000 <swap16>:
   0:	7c0410a0 	wsbh	v0,a0
   4:	03e00008 	jr	ra
   8:	3042ffff 	andi	v0,v0,0xffff
   c:	00000000 	nop

0000000000000010 <swap32>:
  10:	7c0410a0 	wsbh	v0,a0
  14:	03e00008 	jr	ra
  18:	00221402 	ror	v0,v0,0x10
  1c:	00000000 	nop

0000000000000020 <swap64>:
  20:	7c0410a4 	dsbh	v0,a0
  24:	03e00008 	jr	ra
  28:	7c021164 	dshd	v0,v0
  2c:	00000000 	nop
$ 

so I think we ought to make all this conditional on GCC being old enough, 
as the compiler is always better suited to make code scheduling decisions 
when there's no inline asm involved.

 FWIW,

  Maciej



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]