On Tue, Oct 9, 2012 at 1:03 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote: > In fact, not only is that the wmb() the easiest to micro-optimise, it > is the only one that can be - I think. Around manipulating the fence, we > need a read/write barrier in case we have any pending accesses through > the fenced region, since the register write may be reordered passed the > memory reads since there is no obvious dependency. That might just be > heightened paranoia and our memory controller isn't that smart. Yet. So > those two need to be mb() so that I can sleep safely at night. For the > mb() inside set-to-gtt-domain, I don't have a robust explanation other > than that empirically we need a barrier, therefore there is some > lingering incoherency when reusing a bo. (The hangs always seem to occur > when crossing a page boundary, we see stale data.) You could attempt to > insert a read/write barrier depending upon actual usage, but it hardly > seems worth the effort. Hm, I think we can make a case that the barrier before the fence change can only be a wmb(): Racing reads against fence changes is ill-defined anyway, so we don't need the read barrier for that reason. But we need to flush out any store buffers (especially the wc store buffer) before changing the fencing. The mb() afterwards seems to be required, since we need to sync both subsequent reads and writes against the fence mmio write. One thing I wonder is whether we miss any barrier between the wc writes to the ringbuffer and the tail update. If that's the case I wonder where all the bug reports are ... Last one: Which machines blow up when you drop that mb()? Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch