Re: [PATCH 2/2] drm/i915: Keep the per-object list of VMAs under control

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 1 Feb 2016 13:41:36 +0000



On Mon, Feb 01, 2016 at 01:29:16PM +0000, Tvrtko Ursulin wrote:
> 
> On 01/02/16 11:12, Chris Wilson wrote:
> >On Mon, Feb 01, 2016 at 11:00:08AM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>
> >>Where objects are shared across contexts and heavy rendering
> >>is in progress, execlist retired request queue will grow
> >>unbound until the GPU is idle enough for the retire worker
> >>to run and call intel_execlists_retire_requests.
> >>
> >>With some workloads, like for example gem_close_race, that
> >>never happens causing the shared object VMA list to grow to
> >>epic proportions, and in turn causes retirement call sites to
> >>spend linearly more and more time walking the obj->vma_list.
> >>
> >>End result is the above mentioned test case taking ten minutes
> >>to complete and using up more than a GiB of RAM just for the VMA
> >>objects.
> >>
> >>If we instead trigger the execlist house keeping a bit more
> >>often, obj->vma_list will be kept in check by the virtue of
> >>context cleanup running and zapping the inactive VMAs.
> >>
> >>This makes the test case an order of magnitude faster and brings
> >>memory use back to normal.
> >>
> >>This also makes the code more self-contained since the
> >>intel_execlists_retire_requests call-site is now in a more
> >>appropriate place and implementation leakage is somewhat
> >>reduced.
> >
> >However, this then causes a perf regression since we unpin the contexts
> >too frequently and do not have any mitigation in place yet.
> 
> I suppose it is possible. What takes most time - page table clears
> on VMA unbinds? It is just that this looks so bad at the moment. :(
> Luckily it is just the IGT..

On Braswell, in particular it is most noticeable, it is the ioremaps.
Note that we don't unbind the VMA on unpin, just make them available for
reallocation. The basic mitigation strategy that's been sent in a couple
of different forms is to to defer the remapping from the unpin to the
vma unbind (and along the vmap paths from the unpin to the put_pages).
Then the context unpin becomes just a matter of dropping a few
individual pin-counts and ref-counts on the various objects used by the
context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx