On Tue, 9 Oct 2012 19:24:37 +0100 Chris Wilson <chris at chris-wilson.co.uk> wrote: > With a fence, we only need to insert a memory barrier around the actual > fence alteration for CPU accesses through the GTT. Performing the > barrier in flush-fence was inserting unnecessary and expensive barriers > for never fenced objects. > > Note removing the barriers from flush-fence, which was effectively a > barrier before every direct access through the GTT, revealed that we > where missing a barrier before the first access through the GTT. Lack of > that barrier was sufficient to cause GPU hangs. > > v2: Add a couple more comments to explain the new barriers > The docs are slippery on MMIO vs cached accesses (less so on actual I/O port ops), but this does look correct. You might improve the comments a little and quote the IA32 manuals a bit, saying that you're trying to order previous cached accesses with subsequent MMIO accesses that will affect what the CPU reads or writes. Other than that: Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org> -- Jesse Barnes, Intel Open Source Technology Center