On Sun, 25 Mar 2012 19:47:42 +0200, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > The issue is that with inline clflushing the clflushing isn't properly > swizzled. Fix this by > - always clflushing entire 128 byte chunks and > - unconditionally flush before writes when swizzling a given page. > We could be clever and check whether we pwrite a partial 128 byte > chunk instead of a partial cacheline, but I've figured that's not > worth it. There's some black magic here that I haven't fully grasped. We only ever swizzle the gpu address (by whole cachelines), so why do we need to invalidate a pair of cachelines for a single cacheline write? Also we have a lot of assumptions that the cacheline is 64 bytes. Have we tested on gen2 where the GPU cacheline is 32 bytes? -Chris -- Chris Wilson, Intel Open Source Technology Centre