On Mon, Mar 26, 2012 at 10:18:39AM +0100, Chris Wilson wrote: > On Sun, 25 Mar 2012 19:47:42 +0200, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > > The issue is that with inline clflushing the clflushing isn't properly > > swizzled. Fix this by > > - always clflushing entire 128 byte chunks and > > - unconditionally flush before writes when swizzling a given page. > > We could be clever and check whether we pwrite a partial 128 byte > > chunk instead of a partial cacheline, but I've figured that's not > > worth it. > > There's some black magic here that I haven't fully grasped. We only ever > swizzle the gpu address (by whole cachelines), so why do we need to > invalidate a pair of cachelines for a single cacheline write? Well, we do swizzle when doing the actual copy_to|from_user, so strictly speaking we should also swizzle the clflushing in this case. No bit17 swizzling pwrite/pread is pretty much only around for backwards-compat with dead-old userspace, so I've figure I'll just unconditionally align the clflush range with even cachelines when bit17 swizzling is effective on the current page. Instead of adding a complex and rather untested swizzled clflush helper. > Also we have a lot of assumptions that the cacheline is 64 bytes. Have > we tested on gen2 where the GPU cacheline is 32 bytes? We're lucking out in that regard because gen2 doesn't do swizzling. At least my i855gm here, where I could run the corresponding i-g-t tests. And there's a comment in the code that i865g is unswizzled, too. So as long as people create dram controller where 64 bytes is the most effective transaction size, we should be fine. It'll be a fun day though when that changes. Otoh with gen5+ we don't have any bit17 swizzling nonsense anymore because the gpu is much more integrated with the cpu. I hope that trend continues and will prevent any bit17 madness in the future. -Daniel -- Daniel Vetter Mail: daniel at ffwll.ch