On Mon, Jan 14, 2013 at 10:06 PM, Imre Deak <imre.deak at intel.com> wrote: > Well, the first flush would first write any valid data in the cache line > to memory and only then invalidate it. But this would have the same > result as just doing away with the first flush: the original cache line > updated by the dword. > > And if the cache line was invalid the first flush is a no-op and the > write would reload the cache line from memory as you pointed out. The thing is that coherency doesn't work that way if the other side doesn't send out snoop notices: 1. cpu reads the cacheline into it's cache. Note that prefetch is good enough for this. 2. gpu writes new values to the same location in memory, which updates the main memory, but doesn't change anything in the cpu cache state. 3. cpu writes that dword, and updates it's cacheline. 4. clflush writes that cacheline out to main memory. Note that the gpu write in 2 is now overwritten. I admit that it's really hard to come up with a real-world scenario involving relocations (since usually it's the cpu which has last written the batch, with the gpu only reading it in between). But for pwrite it's mandatory for correctness, so I don't want to take any changes. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch