/ From: Benjamin Herrenschmidt [benh@xxxxxxxxxxxxxxxxxxx] | Sent: Wednesday, June 24, 2015 4:38 PM | | It is to be noted that on powerpc at least, writel() and co will never | combine due to the memory barriers in them. Only "normal" stores (or \ __raw_writel) will. / From: Benjamin Herrenschmidt [benh@xxxxxxxxxxxxxxxxxxx] | Sent: Wednesday, June 24, 2015 5:52 PM | | Right, and as I mentioned, on some archs like powerpc (and possibly \ more), writel() and co contains an implicit mb() Hhmmm, interesting. I definitely hadn't known that. In our network drivers we explicitly manage all of our own memory barrier semantics. (wmb() between writes to DMA Queues and writes to Device Registers informing hardware of new data available to be DMA'ed, etc.) And we do have code to take advantage of Write Combining Mappings using writeq(). It looks like this is implemented eventually as out_be64() in arch/powerpc/include/asm/io.h for PPC-BE: #define DEF_MMIO_OUT_D(name, size, insn) \ static inline void name(volatile u##size __iomem *addr, u##size val) \ { \ __asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \ : "=m" (*addr) : "r" (val) : "memory"); \ IO_SET_SYNC_FLAG(); \ } DEF_MMIO_OUT_D(out_be64, 64, std); So, if understand your notes and the source correctly, all of our code to do register reads/writes are getting SYNC instructions, as are the 64-bit writeq()s we're attempting to use for our Write Combining code on PowerPC. Hhmmm indeed ... Is there a reference I can read on this so I can understand when and where we can use the __raw_*() APIs? Can these Raw Read/Write operations be reordered with respect to each other or are the use of the various flavors of SYNC instructions just to maintain order between Cached Memory Accesses and I/O Instructions? Casey-- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html