Re: [PATCH v5 3/5] drivers/soc/litex: add LiteX SoC Controller driver

"Gabriel L. Somlo" <gsomlo@xxxxxxxxx> · Wed, 29 Apr 2020 07:32:09 -0400

Hi Ben,

On Wed, Apr 29, 2020 at 01:21:11PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2020-04-27 at 11:13 +0200, Mateusz Holenko wrote:
> > As Gabriel Somlo <gsomlo@xxxxxxxxx> suggested to me, I could still use
> > readl/writel/ioread/iowrite() standard functions providing memory
> > barriers *and* have values in CPU native endianness by using the
> > following constructs:
> > 
> > `le32_to_cpu(readl(addr))`
> > 
> > and
> > 
> > `writel(cpu_to_le32(value), addr)`
> > 
> > as le32_to_cpu/cpu_to_le32():
> > - does nothing on LE CPUs and
> > - reorders bytes on BE CPUs which in turn reverts swapping made by
> > readl() resulting in returning the original value.
> 
> It's a bit sad... I don't understand why you need this. The HW has a
> fied endian has you have mentioned earlier (and that is a good design).
> 
> The fact that you are trying to shove things into a "smaller pipe" than
> the actual register shouldn't affect at what address the MSB and LSB
> reside. And readl/writel (or ioread32/iowrite32) will always be LE as
> well, so will match the HW layout. Thus I don't see why you need to
> play swapping games here.
> 
> This however would be avoided completely if the HW was a tiny bit
> smarter and would do the multi-beat access for you which shouldn't be
> terribly hard to implement.
> 
> That said, it would be even clearer if you just open coded the 2 or 3
> useful cases: 32/8, 32/16 and 32/32. The loop with calculated shifts
> (and no masks) makes the code hard to understand.

A "compound" LiteX MMIO register of 32 bits total, starting at address
0x80000004, containing value 0x12345678, is spread across 4 8-bit
subregisters aligned at ulong in the MMIO space like this on LE:

0x82000000  00 00 00 00 12 00 00 00 34 00 00 00 56 00 00 00  ........4...V...
                        ^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
0x82000010  78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  x...............
            ^^^^^^^^^^^

and like this on BE:

0x82000000  00 00 00 00 00 00 00 12 00 00 00 34 00 00 00 56  ...........4...V
                        ^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^
0x82000010  00 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00  ...x............
            ^^^^^^^^^^^

LiteX can be optionally built to use larger than 8-bit subregisters,
here's an example with 16-bit subregisters (also aligned at ulong),
for the same "compound" register:

on LE:
0x82000000  00 00 00 00 34 12 00 00 78 56 00 00 00 00 00 00  ....4...xV......
                        ^^^^^^^^^^^ ^^^^^^^^^^^

and on BE:
0x82000000  00 00 00 00 00 00 12 34 00 00 56 78 00 00 00 00  .......4..Vx....
                        ^^^^^^^^^^^ ^^^^^^^^^^^

Essentially (back to the more common 8-bit subregister size), a compound
register foo = 0x12345678 is stored as

	ulong foo[4] = {0x12, 0x34, 0x56, 0x78};

in the CPU's native endianness, aligned at the CPU's native word width
(hence "ulong").

With 16-bit subregisters that would then be:

	ulong foo[2] = {0x1234, 0x5678};

Trouble with readl() and writel() is that they convert everything to LE
internally, which on BE would get us something different *within* each
subregister (i.e., 0x12000000 instead of 0x12, or 0x34120000 instead of
0x1234).

The cleanest way (IMHO) to accomplish an endian-agnostic readl() (that
preserves both barriers AND native endianness) is to undo the internal
__le32_to_cpu() using:

	cpu_to_le32(readl(addr))

This keeps us away from using any '__' internals directly (e.g.,
__raw_readl()), or open-coding our own `litex_readl()`, e.g.:

	static inline u32 litex_readl(const volatile void __iomem *addr)
	{
		u32 val;
		__io_br();
		val = __raw_readl(addr)); /* No le32 byteswap here! */
		__io_ar(val);
		return val;
	}

... which is something that was strongly advised against in earlier
revisions of this series.

Cheers,
--Gabriel