On Mon, 2013-01-14 at 22:51 +0100, Daniel Vetter wrote: > On Mon, Jan 14, 2013 at 10:06 PM, Imre Deak <imre.deak at intel.com> wrote: > > Well, the first flush would first write any valid data in the cache line > > to memory and only then invalidate it. But this would have the same > > result as just doing away with the first flush: the original cache line > > updated by the dword. > > > > And if the cache line was invalid the first flush is a no-op and the > > write would reload the cache line from memory as you pointed out. > > The thing is that coherency doesn't work that way if the other side > doesn't send out snoop notices: > 1. cpu reads the cacheline into it's cache. Note that prefetch is good > enough for this. > 2. gpu writes new values to the same location in memory, which updates > the main memory, but doesn't change anything in the cpu cache state. > 3. cpu writes that dword, and updates it's cacheline. > 4. clflush writes that cacheline out to main memory. > > Note that the gpu write in 2 is now overwritten. > > I admit that it's really hard to come up with a real-world scenario > involving relocations (since usually it's the cpu which has last > written the batch, with the gpu only reading it in between). But for > pwrite it's mandatory for correctness, so I don't want to take any > changes. Ok, I didn't think about GPU side writes for these buffers, but I guess it's possible. Thanks for the explanation. --Imre