On Tue, May 27, 2014 at 09:21:38PM +0100, Benjamin Herrenschmidt wrote: > On Tue, 2014-05-27 at 20:32 +0100, Will Deacon wrote: > > > Why would you need two barriers? I would have though an mmiowb() inlined > > into writel after the store operation would be sufficient. Or is this to > > ensure a non-relaxed write is ordered with respect to a relaxed write? > > Well, so the non-relaxed writel would have to do: > > sync > store > sync > > The first sync is to synchronize with DMAs, so that a sequence of > > store to mem > writel > > Remains ordered vs. the device (ie, when the writel causes the device > to do a DMA, it will see the previous store to mem). > > The second sync is needed as mmiowb, to order with unlocks. Ah yeah, thanks. I was so hung up on the ordering against locks that I completely forgot about DMA! > At this point, I'm keen on keeping my per-cpu trick to avoid that > second one in most cases. Makes sense. The alternative is dropping that requirement and instead relying on drivers to use mmiowb() even with the non-relaxed accessors, but I think that's going to be fairly painful (and hence why you have the trick to start with). Will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html