On Thu, Mar 22, 2018 at 1:35 AM, David Laight <David.Laight@xxxxxxxxxx> wrote: >> x86 has compiler barrier inside the relaxed() API so that code does not >> get reordered. ARM64 architecturally guarantees device writes to be observed >> in order. > > There are places where you don't even need a compile barrier between > every write. > > I had horrid problems getting some ppc code (for a specific embedded SoC) > optimised to have no extra barriers. > I ended up just writing through 'pointer to volatile' and adding an > explicit 'eieio' between the block of writes and status read. This is what you are supposed to do. For accesses to MMIO (cache inhibited + guarded) storage the Power ISA guarantees that load-load and store-store pairs of accesses will always occur in program order, but there's no implicit ordering between load-store or store-load pairs. In those cases you need an explicit eieio barrier between the two accesses. At the HW level you can think of the CPU as having separate queues for MMIO loads and stores. Accesses will be added to the respective queue in program order, but there's no synchronisation between the two queues. If the CPU is doing write combining it's easy to imagine the whole store queue being emptied in one big gulp before the load queue is even touched. > No less painful was doing a byteswapping write to normal memory. What was the problem? The reverse indexed load/store instructions are a little awkward to use, but they work... > > David > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html