Hi Ben, On Wed, Apr 29, 2020 at 01:21:11PM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2020-04-27 at 11:13 +0200, Mateusz Holenko wrote: > > As Gabriel Somlo <gsomlo@xxxxxxxxx> suggested to me, I could still use > > readl/writel/ioread/iowrite() standard functions providing memory > > barriers *and* have values in CPU native endianness by using the > > following constructs: > > > > `le32_to_cpu(readl(addr))` > > > > and > > > > `writel(cpu_to_le32(value), addr)` > > > > as le32_to_cpu/cpu_to_le32(): > > - does nothing on LE CPUs and > > - reorders bytes on BE CPUs which in turn reverts swapping made by > > readl() resulting in returning the original value. > > It's a bit sad... I don't understand why you need this. The HW has a > fied endian has you have mentioned earlier (and that is a good design). > > The fact that you are trying to shove things into a "smaller pipe" than > the actual register shouldn't affect at what address the MSB and LSB > reside. And readl/writel (or ioread32/iowrite32) will always be LE as > well, so will match the HW layout. Thus I don't see why you need to > play swapping games here. > > This however would be avoided completely if the HW was a tiny bit > smarter and would do the multi-beat access for you which shouldn't be > terribly hard to implement. > > That said, it would be even clearer if you just open coded the 2 or 3 > useful cases: 32/8, 32/16 and 32/32. The loop with calculated shifts > (and no masks) makes the code hard to understand. A "compound" LiteX MMIO register of 32 bits total, starting at address 0x80000004, containing value 0x12345678, is spread across 4 8-bit subregisters aligned at ulong in the MMIO space like this on LE: 0x82000000 00 00 00 00 12 00 00 00 34 00 00 00 56 00 00 00 ........4...V... ^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^ 0x82000010 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 x............... ^^^^^^^^^^^ and like this on BE: 0x82000000 00 00 00 00 00 00 00 12 00 00 00 34 00 00 00 56 ...........4...V ^^^^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^ 0x82000010 00 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 ...x............ ^^^^^^^^^^^ LiteX can be optionally built to use larger than 8-bit subregisters, here's an example with 16-bit subregisters (also aligned at ulong), for the same "compound" register: on LE: 0x82000000 00 00 00 00 34 12 00 00 78 56 00 00 00 00 00 00 ....4...xV...... ^^^^^^^^^^^ ^^^^^^^^^^^ and on BE: 0x82000000 00 00 00 00 00 00 12 34 00 00 56 78 00 00 00 00 .......4..Vx.... ^^^^^^^^^^^ ^^^^^^^^^^^ Essentially (back to the more common 8-bit subregister size), a compound register foo = 0x12345678 is stored as ulong foo[4] = {0x12, 0x34, 0x56, 0x78}; in the CPU's native endianness, aligned at the CPU's native word width (hence "ulong"). With 16-bit subregisters that would then be: ulong foo[2] = {0x1234, 0x5678}; Trouble with readl() and writel() is that they convert everything to LE internally, which on BE would get us something different *within* each subregister (i.e., 0x12000000 instead of 0x12, or 0x34120000 instead of 0x1234). The cleanest way (IMHO) to accomplish an endian-agnostic readl() (that preserves both barriers AND native endianness) is to undo the internal __le32_to_cpu() using: cpu_to_le32(readl(addr)) This keeps us away from using any '__' internals directly (e.g., __raw_readl()), or open-coding our own `litex_readl()`, e.g.: static inline u32 litex_readl(const volatile void __iomem *addr) { u32 val; __io_br(); val = __raw_readl(addr)); /* No le32 byteswap here! */ __io_ar(val); return val; } ... which is something that was strongly advised against in earlier revisions of this series. Cheers, --Gabriel