Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

Christian Lamparter <chunkeey@xxxxxxxxxxxxxx> · Thu, 12 May 2016 11:58:18 +0200

On Tuesday, May 10, 2016 09:23:59 AM Arnd Bergmann wrote:
> On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote:
> > On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> > > 
> > > Unfortunately, I don't see any way this could be done in MIPS specific
> > > code: There is typically a byteswap between the internal bus and the PCI
> > > bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,
> > 
> > Ugh ... not exactly, re-watch my talk on the matter :-) While there is
> > a specific lane wiring to preserve byte addresss, in the end it's the
> > end device itself that is either BE or LE. Regardless of any "bus
> > endianness".
> 
> I found your slides on
> 
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp
> 
> but there are at least two more twists that you completely missed here:
> 
> - Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
>   do not implement big-endian mode by wiring up the data lines between the
>   bus and the CPU differently between big- and little-endian mode like
>   powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
>   on 8-bit and 16-bit addresses. The effect of that is that normal RAM
>   accesses work as expected both ways, and devices that are accessed using
>   32-bit MMIO ops never need any byteswap (you actually get "native
>   endian") while MMIO with 8 and 16 bit width does something completely
>   unexpected and touches the wrong register. Having an explicit byteswap
>   on the PCI host bridge gets you the expected addresses again for 8-bit
>   cycles but it also means that readl()/writel() again need to swap the
>   data.
> 
> - Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
>   and use a strapping pin on the SoC flips the endianess of the CPU core
>   at the same time as all the peripheral MMIO registers, with the intention
>   of never requiring any byte swaps. I believe they are implemented careful
>   enough to actually get this right, but it confuses the heck out of
>   Linux drivers that don't expect this.
> 
> > > which matches the expected behavior of readl/writel. However, drivers
> > > for non-PCI devices often use the same readl/writel accessors because
> > > that is how it's done on ARMv6/ARMv7.
> > 
> > Even then, you can have on-SoC (non-PCI) devices that also have a
> > different endianness from the main CPU. How does it work on ARM for
> > example ? The device endianness should be fixed, regardless of the
> > endianness of the core, no ?
> 
> ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian
> and you have to know what it is. Only Freescale managed to put identical
> IP blocks on various (powerpc-derived) SoCs and have a subset of them
> treat the access as little-endian while others remain big-endian, so all
> those drivers now require runtime detection.
> 
> > > Doing it hardcoded by architecture is just the simplest way to deal
> > > with it, working on the assumption that nothing actually needs the
> > > runtime detection that Ben suggested. 
> > 
> > No, it's not an archicture problem. It's a problem specific to that one
> > SoC that the device was synthetized to be a certain endian while it was
> > synthetized differently on another SoC... that also happens to be a
> > different architecture. But doesn't have to.
> > 
> > For example, we had in the past cases of both LE and BE EHCI
> > implementations on the same architecture (PowerPC).
> 
> I understand this, but from what I see in this history of this particular
> driver, all ARM and PowerPC implementations chose to use LE registers for
> DWC2 because the normal approach for these is to not mess with endianess,
> while presumably all MIPS users of the same block wired up the endian-select
> line of the IP block to match that of the CPU core, again because it's
> what you are expected to do on a MIPS based SoC.
> 
> So hardcoding it per architecture would make an assumption based on
> the mindset of the SoC designers rather than strict technical differences,
> and that can fail as soon as someone does things differently on any of
> them (see the Freescale example), but I still think it's the easiest
> workaround for backporting to stable kernels. A revert of the original
> patch would be even easier, but that would break the one big-endian
> MIPS machine we know about.
> 
> > > Detecting the endianess of the
> > > device is probably the best future-proof solution, but it's also
> > > considerably more work to do in the driver, and comes with a
> > > tiny runtime overhead.
> > 
> > The runtime overhead is probably non-measurable compared with the cost
> > of the actual MMIOs.
> 
> Right. The code size increase is probably measurable (but still small),
> the runtime overhead is not.

Ok, so no rebuts or complains have been posted.

I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
and it works: 

Tested-by: Christian Lamparter <chunkeey@xxxxxxxxxxxxxx>

So, how do we go from here? There is are two small issues with the
original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
#ifdef dwc2_log_writes) and I guess a proper subject would be nice.  

Arnd, can you please respin and post it (cc'd stable as well)?
So this is can be picked up? Or what's your plan?

Regards,
Christian