Re: [PATCH 10/18] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 27, 2016 at 04:04:44PM +0100, Dave Gordon wrote:
> On 21/07/16 15:14, Chris Wilson wrote:
> >On Thu, Jul 21, 2016 at 04:39:58PM +0300, Joonas Lahtinen wrote:
> >>On ke, 2016-07-20 at 14:12 +0100, Chris Wilson wrote:
> >>> 	if (so.aux_batch_size > 8) {
> >>>-		ret = req->engine->dispatch_execbuffer(req,
> >>>-						     (so.ggtt_offset +
> >>>-						      so.aux_batch_offset),
> >>>-						     so.aux_batch_size,
> >>>-						     I915_DISPATCH_SECURE);
> >>>+		ret = req->engine->emit_bb_start(req,
> >>>+						 (so.ggtt_offset +
> >>>+						  so.aux_batch_offset),
> >>>+						 so.aux_batch_size,
> >>>+						 I915_DISPATCH_SECURE);
> >>> 		if (ret)
> >>> 			goto out;
> >>> 	}
> >>
> >>The code above this line is exact reason why I don't like the a->b->c
> >>(especially when there is repetition). But it's not new to this patch
> >>so guess it'll do. Some future work to shorten down a little bit might
> >>not hurt.
> >
> >I presume you mean req->engine->x here, not so.y. Is it just the depth
> >and saving 5 columns? Or something else?
> >-Chris
> 
> It can also hurt code efficiency. For example in
> 
> int i915_gem_render_state_init(struct drm_i915_gem_request *req)
> {
>     ret = i915_gem_render_state_prepare(req->engine, &so);
>     ...
>     ret = req->engine->dispatch_execbuffer(req, so.ggtt_offset,
>                 so.rodata->batch_items * 4, I915_DISPATCH_SECURE);
>     ...
>     if (so.aux_batch_size > 8) {
>         ret = req->engine->dispatch_execbuffer(req,
>                 (so.ggtt_offset + so.aux_batch_offset),
>                 so.aux_batch_size, I915_DISPATCH_SECURE);
>                 if (ret)
>                         goto out;
>         }
>     ...
> }
> 
> The compiler may not -- and in this case, *is not allowed to* --
> assume that req->engine is the same at each callsite. We have passed
> the non-const pointer "req" to callees, so it must assume that
> req->engine may have been changed and re-fetch it from memory.
> 
> By inspection of the generated code, we find that a local for a
> value that is used only once is a net loss, twice is breakeven, and
> three or more times is a definite win.
> 
> And it generally makes the code prettier too :)

Only prettiness really matters here, this emission can be shared for all
contexts with the cache only released under memory pressure. For when we
start caring about the regressions in the GL microbenchmarks...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux