Re: [PATCH 3/6] drm/i915: Split the batch pool by engine

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 19 Mar 2015 14:34:43 +0000

On Thu, Mar 19, 2015 at 02:01:37PM +0000, Tvrtko Ursulin wrote:
> 
> On 03/19/2015 12:04 PM, Chris Wilson wrote:
> >On Thu, Mar 19, 2015 at 11:58:17AM +0000, Tvrtko Ursulin wrote:
> >>How about retire all rings and then the inactive batch search with a
> >>global pool becomes only O(num_rings) at worst? Might be worth
> >>saving memory resource (multiple pools) vs. trivial traversal like
> >>that?
> >
> >There isn't a memory resource issue here though. The pool is made out of
> >easily reclaimable objects, and is ultimately limited by just how many
> >batches can be submitted whilst the GPU is active. The principal issue
> >is finding a new buffer to use for the next batch. Splitting by engine
> >is also likely to have nice secondary effects like grouping of batch
> >sizes.
> 
> True on the last bit, yes.
> 
> Also, I was under the wrong impression that only backing storage
> gets discarded and pool objects remain. Maybe it was like that some
> time back in some initial version, no idea now.
> 
> Anyway with this misconception cleared I agree resource problem is
> much smaller, although I still wonder how big or small exactly would
> be a difference in dynamic numbers of allocated pool batches between
> global-pool-but-fixed-inactive-lookup and per-ring-pool scenarios.

My guess is that really it will only be a few buffers in it. If you have
active buffers on a particular ring, you will be likely to reuse them
again very shortly. So the only real question is how many inactive
buffers do you have on that ring that you could be using on the other
before they are reaped by the idle ring. And if you really, really wanted
you could always just search other rings, which (modulo doing the
retire) would be a quick search because of the buckets + strict
ordering.

> Especially since you'll later add buckets and then per ring pools
> with buckets sounds not as optimal, both from design point of view
> and from resource usage point of view, as single bucketed pool with
> efficient object lookup would be.

The buckets have the same memory efficiency as the list. The tradeoff
there is greater static allocation (4 lists instead of 1) to avoid
having the test inside the iterator.

Really the only drawback is that we don't allow buffers to cross between
rings, and so can end up with more temporarily unused buffers. Otoh, the
cmdparser is only for gen7 and is a pita.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx