The history on this patch goes back quite a way. This time around, the patch builds on top of the map_unsynchronized that Eric pushed. Eric's patch attempted only to solve the problem for LLC machines. Unlike my earlier versions of this patch (with the help from Daniel Vetter), we do not attempt to cpu map objects in a unsynchronized manner. The concept is fairly simple - once a buffer is moved into the GTT domain, we can assume it remains there unless we tell it otherwise (via cpu map). It therefore stands to reason that as long as we can keep the object in the GTT domain, and don't ever count on reading back contents, things might just work. I believe as long as we are doing GTT mappings only, we get to avoid worry about clflushing the dirtied cachelines, but that could use some fact checking. The patch makes some assumptions about how the kernel does buffer tracking, this could be conceived as an ABI dependency, but actually the behavior is pretty confined. It exploits the fact the BOs are only moved into the CPU domain under certain circumstances, and daintily dances around those conditions. The main thing here is we assume MADV_WILLNEED prevents the object from getting evicted. I am not aware of a good way to test it's effectiveness performance-wise; but it introduces no regressions with piglit on my ILK, or SNB. Signed-off-by: Ben Widawsky <ben at bwidawsk.net> --- intel/intel_bufmgr_gem.c | 46 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 12 deletions(-) diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c index b776d2f..3872fbb 100644 --- a/intel/intel_bufmgr_gem.c +++ b/intel/intel_bufmgr_gem.c @@ -204,6 +204,11 @@ struct _drm_intel_bo_gem { bool reusable; /** + * Boolean of whether or not we think this buffer is in the GTT domain. + * */ + bool in_gtt_domain; + + /** * Size in bytes of this buffer and its relocation descendents. * * Used to avoid costly tree walking in @@ -1188,6 +1193,9 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable) if (write_enable) bo_gem->mapped_cpu_write = true; + if (!bufmgr_gem->has_llc) + bo_gem->in_gtt_domain = false; + drm_intel_gem_bo_mark_mmaps_incoherent(bo); VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size)); pthread_mutex_unlock(&bufmgr_gem->lock); @@ -1292,6 +1300,7 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo) strerror(errno)); } + bo_gem->in_gtt_domain = true; drm_intel_gem_bo_mark_mmaps_incoherent(bo); VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size)); pthread_mutex_unlock(&bufmgr_gem->lock); @@ -1311,27 +1320,39 @@ int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo) * data that the GPU is busy using (or, more specifically, that if it * does write over the data, it acknowledges that rendering is * undefined). + * */ - int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo) { drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr; + drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo; int ret; - /* If the CPU cache isn't coherent with the GTT, then use a - * regular synchronized mapping. The problem is that we don't - * track where the buffer was last used on the CPU side in - * terms of drm_intel_bo_map vs drm_intel_gem_bo_map_gtt, so - * we would potentially corrupt the buffer even when the user - * does reasonable things. - */ - if (!bufmgr_gem->has_llc) - return drm_intel_gem_bo_map_gtt(bo); - pthread_mutex_lock(&bufmgr_gem->lock); + ret = map_gtt(bo); - pthread_mutex_unlock(&bufmgr_gem->lock); + if (ret) + goto out; + + /* LLC + gtt makes this function quite safe. */ + if (!bufmgr_gem->has_llc) { + if (!bo_gem->in_gtt_domain) { + /* Put the object in the gtt domain. We can assume the + * object remains there unless set_domain happens which + * can only occur if the buffer is flinked, or we do it + * ourself. The first two can be managed in userspace, + * and the last one works as long as we assume + * MADV_WILLNEED prevents unbinding. + */ + ret = drm_intel_gem_bo_madvise_internal(bufmgr_gem, bo_gem, MADV_WILLNEED); + if (ret) + goto out; + drm_intel_gem_bo_start_gtt_access(bo, 1); + } + } +out: + pthread_mutex_unlock(&bufmgr_gem->lock); return ret; } @@ -1506,6 +1527,7 @@ drm_intel_gem_bo_start_gtt_access(drm_intel_bo *bo, int write_enable) set_domain.read_domains, set_domain.write_domain, strerror(errno)); } + bo_gem->in_gtt_domain = true; } static void -- 1.7.10.4