On Tue, Jan 12, 2016 at 06:06:34PM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > > > The double clflush() remains a mystery. > > Actually, I think it's explainable. > > It's wrong to do the clflush *after* the GPU has done the write, which > seems to be what you are doing. > > Why? > > If the GPU really isn't cache coherent, what can happen is: > > - the CPU has the line cached > > - the GPU writes the data > > - you do the clflushopt to invalidate the cacheline > > - you expect to see the GPU data. > > Right? > > Wrong. The above is complete crap. > > Why? > > Very simple reason: the CPU may have had the cacheline dirty at some > level in its caches, so when you did the clflushopt, it didn't just > invalidate the CPU cacheline, it wrote it back to memory. And in the > process over-wrote the data that the GPU had written. Forgive me for being dense, but if we overwrite the GPU data in the backing struct page with the cacheline from the CPU, how do we see the results from the GPU afterwards? > Now you can say "but the CPU never wrote to the cacheline, so it's not > dirty in the CPU caches". That may or may not be trie. The CPU may > have written to it quite a long time ago. What we do is we clflush the entire object after it is written to by the CPU (including newly allocated objects from shmemfs, or pages being returned to us by shmemfs) before any operation on that object by the GPU. We have to so that the GPU sees the correct page contents. (If I change the clflush of the written objects to a wbinvd, that's not sufficient for the tests to pass.) We do not clflush the object after we read the backing pages on the CPU before the next GPU operation, even if it is a GPU write. This leaves us with clean but stale cachelines. Should. That's why we then clflush_cache_range() prior to the next read on the object, it is intended to be a pure cache line invalidation. If we clflush the entire object between every CPU read back and the *next* GPU operation, it fails. If we clflush the object before every GPU write to it, it passes. And to refresh, we always clflush after a CPU write. I am reasonably confident that any cachelines we dirty (or inherited) are flushed. What you are suggesting is that there are dirty cachelines regardless. I am also reasonably confident that even if we clflush the entire object after touching it before the GPU write, and clflush the individual cachelines again after the GPU write, we see the errors. I haven't found the hole yet, or been convincingly able to explain the differences between gen. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel