Re: [PATCH] MIPS: UAPI: Fix unrecognized opcode WSBH/DSBH/DSHD when using MIPS16.

Yousong Zhou <yszhou4tech@xxxxxxxxx> · Sat, 5 Sep 2015 11:29:53 +0800

On 5 September 2015 at 02:52, Maciej W. Rozycki <macro@xxxxxxxxxxxxxx> wrote:
> On Thu, 3 Sep 2015, Yousong Zhou wrote:
>
>> The nomips16 has to be added both as function attribute and assembler
>> directive.
>>
>> When only function attribute is specified, the compiler will inline the
>> function with -Os optimization.  The generated assembly code cannot be
>> correctly assembled because ISA mode switch has to be done through jump
>> instruction.
>
>  This can't be true.  The compiler does not intepret the contents of an
> inline asm and therefore cannot decide whether to inline a function
> containing one or not based on the lone presence or the absence of an
> assembly directive within.
>

Most of the time I trust my compiler and never meddle with the
toolchain.  Anyway I made a patch because it really did not work for
me.   No big deal.  It's not the end of world.  It started with a
comment from OpenWrt packages feeds [1].  Actually this "unrecognized
opcode" problem have occurred within OpenWrt quite a few times before.

 [1] https://github.com/openwrt/packages/commit/1e29676a8ac74f797f8ca799364681cec575ae6f#commitcomment-12901931

>> When only ".set nomips16" directive is used, the generated assembly code
>> will use MIPS32 code for the inline assembly template and MIPS16 for the
>> function return.  The compiled binary is invalid:
>>
>>     00403100 <__arch_swab16>:
>>       403100:   7c0410a0    wsbh    v0,a0
>>       403104:   e820ea31    swc2    $0,-5583(at)
>>
>> while correct code should be:
>>
>>     00402650 <__arch_swab16>:
>>       402650:   7c0410a0    wsbh    v0,a0
>>       402654:   03e00008    jr  ra
>>       402658:   3042ffff    andi    v0,v0,0xffff
>
>  It looks to me you're papering something over here -- when you use a
> `.set nomips16' directive the assembler will happily switch your
> instruction set in the middle of an instruction stream.  Consequently if
> this function is (for whatever reason; it should not really) inlined in
> MIPS16 code, then you'll get a MIPS32 instruction in the middle, which
> will obviously be interpreted differently in the MIPS16 execution mode and
> is therefore surely a recipe for disaster.

If by "papering" you mean "made up", then whatever.  Yeah, it's
disaster, an "invalid instruction" abort.

>
>  How did you test your change and made the conclusion quoted with your
> patch description?
>

Compile the following program with a MIPS 24kc big endian variant compiler with
flag "-mips32r2 -mips16 -Os".

    #include <stdio.h>
    #include <stdint.h>

    uint16_t __attribute__((noinline)) f(uint16_t v)
    {
        v = __cpu_to_le16(v);
        return v;
    }

    int main()
    {
        printf("%x\n", f(0xbeef));

        return 0;
    }

When only ".set nomips16" was specified in __arch_swab16(), this was output
from objdump.

    242 004003e0 <f>:
    243   4003e0:       7c0410a0        wsbh    v0,a0
    244   4003e4:       e820ea31        swc2    $0,-5583(at)
    245   4003e8:       65006500        0x65006500
    246   4003ec:       65006500        0x65006500

__arch_swab16() was indeed inlined.  That swc2 instruction can be got from
assembler with the following code (it's from the "-S" result of GCC).

    .set    mips16
    .set    noreorder
    .set    nomacro
    j       $31
    zeh     $2

When only nomips16 function attribute was specified, this time GCC failed with
unrecognized opcode

    /tmp/ccaGCouL.s: Assembler messages:
    /tmp/ccaGCouL.s:21: Error: unrecognized opcode `wsbh $2,$4'

The generated assembly was something in the following form.  Looks like the
assembler did not automatically switch to MIPS32 mode when ".set arch=mip32r2"

        .set mips16

        .ent f
        .type f, @function
    f
        ...
        .set push
        .set arch=mips32r2
        wsbh $2,$4
        .pop
        j $31
        zeh $2
        .end f
        ...

The patch was run tested on QEMU Malta and an router with Atheros
AR9331 SoC.  I didn't test __arch_swab64() though.  I have done many
other trial-and-error tests while preparing this patch.  It was a mess
when I was sure I should expect some sensible behaviour from the
compiler while it actually just did not behave that way.

>> Signed-off-by: Yousong Zhou <yszhou4tech@xxxxxxxxx>
>> ---
>>  arch/mips/include/uapi/asm/swab.h |   12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/mips/include/uapi/asm/swab.h b/arch/mips/include/uapi/asm/swab.h
>> index 8f2d184..c4ddc4f 100644
>> --- a/arch/mips/include/uapi/asm/swab.h
>> +++ b/arch/mips/include/uapi/asm/swab.h
>> @@ -16,11 +16,13 @@
>>  #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 2)) ||            \
>>      defined(_MIPS_ARCH_LOONGSON3A)
>>
>> -static inline __attribute_const__ __u16 __arch_swab16(__u16 x)
>> +static inline __attribute__((nomips16)) __attribute_const__
>> +             __u16 __arch_swab16(__u16 x)
>>  {
>>       __asm__(
>>       "       .set    push                    \n"
>>       "       .set    arch=mips32r2           \n"
>> +     "       .set    nomips16                \n"
>>       "       wsbh    %0, %1                  \n"
>>       "       .set    pop                     \n"
>>       : "=r" (x)
>
>  So setting aside the correctness issues discussed above, for MIPS16 code
> this has to be put out of line by the compiler, with all the usual
> overhead of a function call, causing a performance loss rather than a gain
> expected here.  Especially as switching the ISA mode requires draining the
> whole pipeline so that subsequent instructions are interpreted in the new
> execution mode.  This is expensive in performance terms.
>
>  I'm fairly sure simply disabling these asms (#ifndef __mips16) and
> letting the compiler generate the right mask and shift operations from
> plain C code will be cheaper in performance terms and possibly cheaper in
> code size as well, especially in otherwise leaf functions where an extra
> function call will according to the ABI clobber temporaries.  So I suggest
> going in that direction instead.

I agree.  Then you will provide the fix right?  I am just curious
where that __mips16 should be placed or is it from compiler and
assembler?

>
>  So this is a NAK really.

okay.

Cheers

                yousong