On Mon, Dec 15, 2014 at 08:55:32AM +0100, Daniel Vetter wrote: > On Sun, Dec 14, 2014 at 03:37:36PM -0800, Ben Widawsky wrote: > > On Sun, Dec 14, 2014 at 03:12:21PM +0200, Ville Syrjälä wrote: > > > On Sat, Dec 13, 2014 at 07:08:24PM -0800, Ben Widawsky wrote: > > > > If we're moving a bunch of buffers from the CPU domain to the GPU domain, and > > > > we've already blown out the entire cache via a wbinvd, there is nothing more to > > > > do. > > > > > > > > With this and the previous patches, I am seeing a 3x FPS increase on a certain > > > > benchmark which uses a giant 2d array texture. Unless I missed something in the > > > > code, it should only effect non-LLC i915 platforms. > > > > > > > > I haven't yet run any numbers for other benchmarks, nor have I attempted to > > > > check if various conformance tests still pass. > > > > > > > > NOTE: As mentioned in the previous patch, if one can easily obtain the largest > > > > buffer and attempt to flush it first, the results would be even more desirable. > > > > > > So even with that optimization if you only have tons of small buffers > > > that need to be flushed you'd still take the clflush path for every > > > single one. > > > > > > How difficult would it to calculate the total size to be flushed first, > > > and then make the clflush vs. wbinvd decision base on that? > > > > > > > I'll write the patch and send it to Eero for test. > > > > It's not hard, and I think that's a good idea as well. One reason I didn't put > > such code in this series is that moves away from a global DRM solution (and like > > I said in the cover-letter, I am fine with that). Implementing this, I think in > > the i915 code we'd just iterate through the BOs until we got to a certain > > threshold, then just call wbinvd() from i915 and not even both with drm_cache. > > You could also maybe try to shorcut if there are more than X buffers. > > I don't mind an i915 specific solution (we have them already in many > places). So will wait for the results of this experiments before merging > more patches. I actually think an i915 specific solution is required, as the making drm_clflush_pages autoselect is likely to cause regressions on unaware drivers (e.g. anything that has a reservation loop in execbuf like i915). I do think that execbuf regularly hitting clflush is something to be concerned about in userspace (even a wbinvd every execbuf is not going to be nice, even an mfence per execbuf shows up on profiles!), but we do have to occassionally flush large objects like framebuffers which probably would be faster using wbinvd. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx