Re: [PATCH v2 03/13] sh: remove unaligned access for sh4a

Arnd Bergmann <arnd@xxxxxxxxxx> · Fri, 14 May 2021 14:22:58 +0200

On Fri, May 14, 2021 at 12:34 PM John Paul Adrian Glaubitz
<glaubitz@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Arnd!
>
> On 5/14/21 12:00 PM, Arnd Bergmann wrote:
> > Unlike every other architecture, sh4a uses an inline asm implementation
> > for get_unaligned(). I have shown that this produces better object
> > code than the asm-generic version. However, there are very few users of
> > arch/sh/ overall, and most of those seem to use sh4 rather than sh4a CPU
> > cores, so it seems not worth keeping the complexity in the architecture
> > independent code.
>
> My Renesas SH4-Boards actually run an sh4a-Kernel, not an sh4-Kernel:
>
> root@tirpitz:~> uname -a
> Linux tirpitz 5.11.0-rc4-00012-g10c03c5bf422 #161 PREEMPT Mon Jan 18 21:10:17 CET 2021 sh4a GNU/Linux
> root@tirpitz:~>
>
> So, if this change reduces performance on sh4a, I would rather not merge it.

It only makes a difference in very specific scenarios in which unaligned
accesses are done in a fast path, e.g. when forwarding network packet
at a high rate on a big-endian kernel (little-endian kernels wouldn't run into
this on IP headers). If you have a use case for this machine on which the
you can show a performance regression, I can add a patch on top to put
the optimized sh4a get_unaligned_le32() back. Dropping this patch
altogether would make the series much more complex because most of
the associated code gets removed in the end.

As I mentioned, supporting "movua" in the compiler likely has a much
larger impact on performance, as it would also help in user space, and
it should improve the networking case on little-endian kernels by replacing
the four separate byte loads/shift pairs with a movua plus a byteswap.

Not sure if there are gcc developers that have an active interest in sh4a
support any more.

      Arnd