On Mon, Jan 11, 2016 at 12:11:05PM -0800, Linus Torvalds wrote: > On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > > > Bizarrely, > > > > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c > > index 6000ad7..cf074400 100644 > > --- a/arch/x86/mm/pageattr.c > > +++ b/arch/x86/mm/pageattr.c > > @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int size) > > for (; p < vend; p += clflush_size) > > clflushopt(p); > > > > + clflushopt(vend-1); > > mb(); > > } > > EXPORT_SYMBOL_GPL(clflush_cache_range); > > > > works like a charm. > > Have you checked all your callers? If the above makes a difference, it > really sounds like the caller has passed in a size of zero, resulting > in no cache flush, because the caller had incorrect ranges. The > additional clflushopt now flushes the previous cacheline that wasn't > flushed correctly before. > > That "size was zero" thing would explain why changing the loop to "p > <= vend" also fixes things for you. This is on top of HPA's suggestion to do the size==0 check up front. > IOW, just how sure are you that all the ranges are correct? All our callers are of the pattern: memcpy(dst, vaddr, size) clflush_cache_range(dst, size) or clflush_cache_range(vaddr, size) memcpy(dst, vaddr, size) I am resonably confident that the ranges are sane. I've tried to verify that we do the clflushes by forcing them. However, if I clflush the whole object instead of the cachelines around the copies, the tests pass. (Flushing up to a couple of megabytes instead of a few hundred bytes, it is hard to draw any conclusions about what the bug might be.) I can narrow down the principal buggy path by doing the clflush(vend-1) in the callers at least. The problem is that the tests that fail are those looking for bugs in the coherency code, which may just as well be caused by the GPU writing into those ranges at the same time as the CPU trying to read them. I've looked into timing and tried adding udelay()s or uncached mmio along the suspect paths, but that didn't change the presentation - having a udelay fix the issue is usually a good indicator of a GPU write that hasn't landed before the CPU read. The bug only affects a couple of recent non-coherent platforms, earlier Atoms and older Core seem unaffacted. That may also mean that it is the GPU flush instruction that changed between platforms and isn't working (as we intended at least). Thanks for everyone's help and suggestions, -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel