On Mon, Mar 26, 2018 at 6:21 PM, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > > On 26/03/18 04:53 AM, Arnd Bergmann wrote: >> On most architectures, this is not important: >> - For x86, the stores are aways atomic and no additional barriers >> are needed, so the two are the same >> - For ARM (both 32 and 64-bit), powerpc and many others, we don't >> use the generic iowrite() and just fall back to writel() or >> writel(swab32()). >> >> However, shouldn't we just use the writel(swab32()) logic here as well >> for the common case rather than risking missing barriers? > > Hmm, I don't know... it's complicated? > > Doing a bit of digging shows that the existing code was written during a > time when writel() did not include extra barriers over __raw_writel() in > any of the common arches. > > The commit logs don't seem to provide any guidance as to why this it was > done this way, but I'd assume it was done to avoid a double swab() call > on BE arches. Seeing writel() is typically implemented as: > > __raw_writel(__cpu_to_le32(value), addr); > > Then on BE arches, writel(swab32()) would become: > > __raw_writel(swab32(swab32(value)), addr) > > Which seems undesirable. I wouldn't expect it to matter: the byte swap is almost always much cheaper compared to the actual bus access for the MMIO, and I would also guess that modern compilers can eliminate the double byte swap on architectures where writel() is an inline function. Most of the important architectures use ARCH_USE_BUILTIN_BSWAP, which guarantees that. Arnd