While looking at a recent benchmark on a non-LLC platform, Kristian noticed that the amount of time simply spent cflushing buffers was not only measurable, but dominating the profile. It's possible I'm oversimplifying the problem, but it seems like for cases where we have a slow CPU, and when you know the set of BOs is using all or most of the cache, wbinvd is the optimal solution. The gains are about 3.5x FPS on the micro-benchmark with these patches. These patches attempt to make a generic solution which could potentially be used by other drivers. It can just as easily be implemented solely in i915, and if that's what people find more desirable and safe, I am happy to do that as well. I wouldn't say these patches are ready for inclusion as I haven't spent much time testing, or polishing them. I would like feedback on what people think of the general idea. Thoughts on figuring out when to switch over to wbinvd, and in particular [as mentioned in patch 3] if I even need to do the synchronized wbinvd. (For the time being, I have convinced myself we can avoid it on i915, but I am quite often wrong about such things; more details in the relevant patch.) PPC specific code is only compile tested. Thanks. Ben Widawsky (4): drm/cache: Use wbinvd helpers drm/cache: Try to be smarter about clflushing on x86 drm/cache: Return what type of cache flush occurred drm/i915: Opportunistically reduce flushing at execbuf drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++--------- drivers/gpu/drm/i915/i915_drv.h | 3 +- drivers/gpu/drm/i915/i915_gem.c | 12 +++---- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 +++-- drivers/gpu/drm/i915/intel_lrc.c | 8 +++-- include/drm/drmP.h | 13 +++++-- 6 files changed, 66 insertions(+), 32 deletions(-) -- 2.1.3 _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel