On Sun, Mar 12, 2017 at 06:19:17PM +0100, David Weinehall wrote: > On Sun, Mar 12, 2017 at 01:21:12PM +0000, Chris Wilson wrote: > > On Fri, Mar 10, 2017 at 05:14:32PM -0800, Kenneth Graunke wrote: > > > On systems without LLC, drm_intel_gem_bo_map_unsynchronized() has > > > had the surprising behavior of doing a synchronized GTT mapping. > > > This is obviously not what the user of the API wanted. > > > > > > Eric left a comment indicating a valid concern: if the CPU and GPU > > > caches are incoherent, we don't keep track of where the user last > > > mapped the buffer, and what caches might contain relevant data. > > > > Note this is an issue in libdrm_intel not tracking the cache domain > > transitions. Even just a switch between cpu and coherent would solve the > > majority of that - the caveat being shared bo where the tracking is > > incomplete. > > > > > Modern Atom systems still don't have LLC, but they do offer snooping, > > > which effectively makes the caches coherent. The kernel appears to > > > set up the PTE/PPAT to enable snooping for everything where the cache > > > level is not I915_CACHE_NONE. As far as I know, only scanout buffers > > > are marked as uncached. > > > > Byt, bsw beg to differ. I don't have a bxt to know the results of the > > igt/kernel tests. > > Just give me a list of the tests to run (and, if any, what patches > to apply and the debugging level you want enabled) and I'll provide > the necessary results. The most important result is igt/gem_mmap_gtt/coherency. That tests if a write through the GTT is immediately visible in the backing storage. (It should fail...) To test the proposed used here that GTT + snooping is ok, first requires disabling the test forbidding GTT + snooping in i915_gem_fault. Then similar tests to gem_exec_flush or directly from kselftests/coherency can be used to spot if we need any flushes. > > > Any buffers used by scanout should be flagged as non-reusable with > > > drm_intel_bo_disable_reuse(), prime export, or flink. So, we can > > > assume that any reusable buffer should be snooped. > > > > Not really, there is no reason why scanout buffers can't be reused. > > > > > This patch enables unsynchronized mappings for reusable buffers > > > on all Gen6+ hardware (which have either LLC or snooping). > > > > > > On Broxton, this improves the performance of Unigine Valley 1.0 > > > on Low settings at 1280x720 by about 45%, and Unigine Heaven 4.0 > > > (same settings) by about 53%. > > > > Does anyone have figures for gtt performance on bxt - does it cover over > > the same performance penalty from earler atoms? Basically why bother to > > enable this over wc mapping (no stalls for a contended, limited > > resource) + detiling. (Just note that for detiling Y to WC you need to > > use a temporary cacheable page, or rearrange the code to make sure the > > reads/writes are in 64 byte chunks.) > > Again, I can run any tests you'd like to get numbers from, > just give me a list. gem_gtt_speed $obj_size will tell us the relative performance of untiled/tiled GTT access vs WC/WB. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx