On Wed, Jun 18, 2014 at 10:49:05AM -0700, Volkin, Bradley D wrote: > On Wed, Jun 18, 2014 at 09:52:52AM -0700, Chris Wilson wrote: > > On Wed, Jun 18, 2014 at 09:36:14AM -0700, bradley.d.volkin@xxxxxxxxx wrote: > > > From: Brad Volkin <bradley.d.volkin@xxxxxxxxx> > > > > > > This patch sets up all of the tracking and copying necessary to > > > use batch pools with the command parser, but does not actually > > > dispatch the copied (shadow) batch to the hardware yet. We still > > > aren't quite ready to set the secure bit during dispatch. > > > > > > Note that performance takes a hit from the copy in some cases > > > and will likely need some work. At a rough pass, the memcpy > > > appears to be the bottleneck. Without having done a deeper > > > analysis, two ideas that come to mind are: > > > 1) Copy sections of the batch at a time, as they are reached > > > by parsing. Might improve cache locality. > > > 2) Copy only up to the userspace-supplied batch length and > > > memset the rest of the buffer. Reduces the number of reads. > > > > > > Signed-off-by: Brad Volkin <bradley.d.volkin@xxxxxxxxx> > > > --- > > > drivers/gpu/drm/i915/i915_cmd_parser.c | 75 ++++++++++++++++++++++------ > > > drivers/gpu/drm/i915/i915_drv.h | 7 ++- > > > drivers/gpu/drm/i915/i915_gem.c | 8 ++- > > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 45 +++++++++++++++-- > > > drivers/gpu/drm/i915/i915_gem_render_state.c | 2 +- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 12 +++++ > > > drivers/gpu/drm/i915/intel_ringbuffer.h | 7 +++ > > > 7 files changed, 134 insertions(+), 22 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c > > > index dea99d9..669afb0 100644 > > > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > > > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > > > @@ -831,6 +831,53 @@ finish: > > > return (u32*)addr; > > > } > > > > > > +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */ > > > +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj, > > > + struct drm_i915_gem_object *src_obj) > > > +{ > > > + int ret = 0; > > > + int needs_clflush = 0; > > > + u32 *src_addr, *dest_addr = NULL; > > > + > > > + ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush); > > > + if (ret) { > > > + DRM_DEBUG_DRIVER("CMD: failed to prep read\n"); > > > + return ERR_PTR(ret); > > > + } > > > + > > > + src_addr = vmap_batch(src_obj); > > > + if (!src_addr) { > > > + DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n"); > > > + ret = -ENOMEM; > > > + goto unpin_src; > > > + } > > > + > > > + if (needs_clflush) > > > + drm_clflush_virt_range((char *)src_addr, src_obj->base.size); > > > + > > > + ret = i915_gem_object_set_to_cpu_domain(dest_obj, true); > > > + if (ret) { > > > + DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n"); > > > + goto unmap_src; > > > + } > > > + > > > + dest_addr = vmap_batch(dest_obj); > > > + if (!dest_addr) { > > > + DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n"); > > > + ret = -ENOMEM; > > > + goto unmap_src; > > > + } > > > + > > > + memcpy(dest_addr, src_addr, src_obj->base.size); > > > + > > > +unmap_src: > > > + vunmap(src_addr); > > > +unpin_src: > > > + i915_gem_object_unpin_pages(src_obj); > > > + > > > + return ret ? ERR_PTR(ret) : dest_addr; > > > +} > > > > src_obj->size? You should perhaps borrow a byt. > > Not sure I completely follow you here. The dest_obj is as big or bigger than > the source, so I think copying only as much data as exists in the source > object is right. I should be writing nops into any extra space though. And in > parse_cmds, I should update the batch_end calculation. Were those the issues > that you saw? Just that generally src->size >> batch len and clflush will make it twice as expensive on byt. (The object may be about 1000 times larger than the batch at the extreme, libva *cough*.) > > > static int > > > @@ -1087,6 +1088,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, > > > struct drm_i915_private *dev_priv = dev->dev_private; > > > struct eb_vmas *eb; > > > struct drm_i915_gem_object *batch_obj; > > > + struct drm_i915_gem_object *shadow_batch_obj = NULL; > > > struct drm_clip_rect *cliprects = NULL; > > > struct intel_engine_cs *ring; > > > struct intel_context *ctx; > > > @@ -1288,8 +1290,31 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, > > > batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND; > > > > > > if (i915_needs_cmd_parser(ring)) { > > > + shadow_batch_obj = > > > + i915_gem_batch_pool_get(ring->batch_pool, > > > + batch_obj->base.size); > > > + if (IS_ERR(shadow_batch_obj)) { > > > + ret = PTR_ERR(shadow_batch_obj); > > > + /* Don't try to clean up the obj in the error path */ > > > + shadow_batch_obj = NULL; > > > + > > > + /* > > > + * If the pool is at capcity, retiring requests might > > > + * return some buffers. > > > + */ > > > + if (ret == -EAGAIN) > > > + i915_gem_retire_requests_ring(ring); > > > + > > > + goto err; > > > + } > > > + > > > + ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0); > > > + if (ret) > > > + goto err; > > > + > > > ret = i915_parse_cmds(ring, > > > batch_obj, > > > + shadow_batch_obj, > > > args->batch_start_offset, > > > file->is_master); > > > > You could just allocate the shadow inside parse_cmds and return it. Then > > replace the conventional batch_boj with the new pointer and add it to > > the eb list. Similarly, you do not need to track shadow_batch_obj > > explicitly in the requests, the pool can do its own busy tracking > > external to the core and reduce the invasiveness of the patch. > > Overall, I agree that this touched more places than I would have liked. > > I initially thought about replacing batch_obj and hooking into the eb list, > but that seemed like it would involve trickier code w.r.t. ref counting, > making up an exec_entry for the vma, etc. Not the end of the world, but > it felt less obvious. I can look again though if you think it's worth it. I think so. The request is already complicated enough and I think the shadow batch can fit neatly inside the rules we already have with a modicum of stitching here. > What specifically are you thinking in terms of implementing busy tracking > in the pool? The idea with adding the shadow object to the request was just > to get it back in the pool and purgeable asap. I also thought it would limit > some additional code given that we know buffers in the pool have had any > pending work completed. Maybe the suggested approach would do a better job > of those things though. I was thinking that a pool is basically a list of bo, and you simply query whether the oldest was still busy when we need a new bo. Which is the same as how userspace implements its pool of active/inactive objects. > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > > > index e72017b..82aae29 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > > > @@ -188,6 +188,13 @@ struct intel_engine_cs { > > > bool needs_cmd_parser; > > > > > > /* > > > + * A pool of objects to use as shadow copies of client batch buffers > > > + * when the command parser is enabled. Prevents the client from > > > + * modifying the batch contents after software parsing. > > > + */ > > > + struct i915_gem_batch_pool *batch_pool; > > > > Why are they tied to a ring? > > There was a question as to whether to make a global pool or per-ring > pools way back when I first sent the parser series. Daniel expressed > a slight preference for per-ring, but I don't object to a global pool. > I suppose a potential benefit to per-ring would be if userspace uses > different sized buffers on different rings, we'd more easily find a > suitable buffer. But enhancing the pool management could fix that too. Batch buffer sizes are not fixed per ring anyway. I guess Daniel thought it might work out easier, but memory is a global resource and should be tracked primarily at the device level. Userspace segregates its caches based on size to speed up searches. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx