On Fri, Nov 07, 2014 at 02:22:02PM -0800, bradley.d.volkin@xxxxxxxxx wrote: > From: Brad Volkin <bradley.d.volkin@xxxxxxxxx> > > This patch sets up all of the tracking and copying necessary to > use batch pools with the command parser and dispatches the copied > (shadow) batch to the hardware. > > After this patch, the parser is in 'enabling' mode. > > Note that performance takes a hit from the copy in some cases > and will likely need some work. At a rough pass, the memcpy > appears to be the bottleneck. Without having done a deeper > analysis, two ideas that come to mind are: > 1) Copy sections of the batch at a time, as they are reached > by parsing. Might improve cache locality. > 2) Copy only up to the userspace-supplied batch length and > memset the rest of the buffer. Reduces the number of reads. > > v2: > - Remove setting the capacity of the pool > - One global pool instead of per-ring pools > - Replace batch_obj with shadow_batch_obj and hook into eb->vmas > - Memset any space in the shadow batch beyond what gets copied > - Rebased on execlist prep refactoring > > v3: > - Rebase on chained batch handling > - Squash in setting the secure dispatch flag > - Add a note about the interaction w/secure dispatch pinning > - Check for request->batch_obj == NULL in i915_gem_free_request > > v4: > - Fix read domains for shadow_batch_obj > - Remove the set_to_gtt_domain call from i915_parse_cmds > - ggtt_pin/unpin in the parser block to simplify error handling > - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag > - Remove i915_gem_batch_pool_put calls > > Signed-off-by: Brad Volkin <bradley.d.volkin@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_cmd_parser.c | 79 +++++++++++++++++++++++------- > drivers/gpu/drm/i915/i915_dma.c | 1 + > drivers/gpu/drm/i915/i915_drv.h | 8 +++ > drivers/gpu/drm/i915/i915_gem.c | 2 + > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 49 ++++++++++++++++-- > 5 files changed, 117 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > index 809bb95..5a3f4e4 100644 > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > @@ -838,6 +838,56 @@ finish: > return (u32*)addr; > } > > +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */ > +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj, > + struct drm_i915_gem_object *src_obj) > +{ > + int ret = 0; > + int needs_clflush = 0; > + u32 *src_addr, *dest_addr = NULL; > + > + ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush); > + if (ret) { > + DRM_DEBUG_DRIVER("CMD: failed to prep read\n"); > + return ERR_PTR(ret); > + } > + > + src_addr = vmap_batch(src_obj); > + if (!src_addr) { > + DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n"); > + ret = -ENOMEM; > + goto unpin_src; > + } > + > + if (needs_clflush) > + drm_clflush_virt_range((char *)src_addr, src_obj->base.size); > + > + ret = i915_gem_object_set_to_cpu_domain(dest_obj, true); > + if (ret) { > + DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n"); > + goto unmap_src; > + } > + > + dest_addr = vmap_batch(dest_obj); Considering that this does a straightforward copy (if we ignore the highly dubious memset), any chance of a kmap_atomic(); copy_page(); loop? (Modulo first/last pages which may not be aligned.) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx