On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote: > If the user requires patching of their batch or auxiliary buffers, we > currently make the alterations on the cpu. If they are active on the GPU > at the time, we wait under the struct_mutex for them to finish executing > before we rewrite the contents. This happens if shared relocation trees > are used between different contexts with separate address space (and the > buffers then have different addresses in each), the 3D state will need > to be adjusted between execution on each context. However, we don't need > to use the CPU to do the relocation patching, as we could queue commands > to the GPU to perform it and use fences to serialise the operation with > the current activity and future - so the operation on the GPU appears > just as atomic as performing it immediately. Performing the relocation > rewrites on the GPU is not free, in terms of pure throughput, the number > of relocations/s is about halved - but more importantly so is the time > under the struct_mutex. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> <SNIP> > /* Must be a variable in the struct to allow GCC to unroll. */ > + cache->gen = INTEL_GEN(i915); > cache->has_llc = HAS_LLC(i915); > - cache->has_fence = INTEL_GEN(i915) < 4; > - cache->use_64bit_reloc = HAS_64BIT_RELOC(i915); > + cache->has_fence = cache->gen < 4; > + cache->use_64bit_reloc = cache->gen >= 8; Keep using HAS_64BIT_RELOC(i915)... > +static u32 *reloc_gpu(struct i915_execbuffer *eb, > + struct i915_vma *vma, > + unsigned int len) > +{ > + struct reloc_cache *cache = &eb->reloc_cache; > + u32 *cmd; > + > + if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1)) There's no overflow chance here, so I'd rq_size + len + 1. > + reloc_gpu_flush(cache); > + > + if (!cache->rq) { > + struct drm_i915_gem_object *obj; > + struct drm_i915_gem_request *rq; > + struct i915_vma *batch; > + int err; > + > + GEM_BUG_ON(vma->obj->base.write_domain & I915_GEM_DOMAIN_CPU); Use ==. I just close my eyes for the reminder of this function and expect v2 to have a proper teardown :P Also vma / obj pair naming varies from what they usually are, so I'd consider renaming one of them to lessen confusion. > @@ -1012,6 +1148,67 @@ relocate_entry(struct i915_vma *vma, > bool wide = eb->reloc_cache.use_64bit_reloc; > void *vaddr; > > + if (!eb->reloc_cache.vaddr && > + (DBG_FORCE_RELOC == FORCE_GPU_RELOC || > + !reservation_object_test_signaled_rcu(obj->resv, true))) { > + const unsigned int gen = eb->reloc_cache.gen; > + unsigned int len; > + u32 *batch; > + u64 addr; > + > + if (wide) > + len = offset & 7 ? 8 : 5; > + else if (gen >= 4) > + len = 4; > + else if (gen >= 3) > + len = 3; > + else /* On gen2 MI_STORE_DWORD_IMM uses a physical address */ > + goto repeat; > + > + batch = reloc_gpu(eb, vma, len); > + if (IS_ERR(batch)) > + goto repeat; > + > + addr = gen8_canonical_addr(vma->node.start + offset); > + if (wide) { > + if (offset & 7) { > + *batch++ = MI_STORE_DWORD_IMM_GEN4; This indent level calls for a new function. __relocate_entry_gpu Couldn't we share some of this STORE IMM code we have around? I don't want to crawl the specs every time :( Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx