Re: [PATCH 03/11] drm/i915: Stop tracking MRU activity on VMA

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Tue, 10 Jul 2018 17:01:56 +0100

On 10/07/2018 13:37, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-07-10 13:19:51)

On 09/07/2018 14:02, Chris Wilson wrote:
Our goal is to remove struct_mutex and replace it with fine grained
locking. One of the thorny issues is our eviction logic for reclaiming
space for an execbuffer (or GTT mmaping, among a few other examples).
While eviction itself is easy to move under a per-VM mutex, performing
the activity tracking is less agreeable. One solution is not to do any
MRU tracking and do a simple coarse evaluation during eviction of
active/inactive, with a loose temporal ordering of last
insertion/evaluation. That keeps all the locking constrained to when we
are manipulating the VM itself, neatly avoiding the tricky handling of
possible recursive locking during execbuf and elsewhere.

Note that discarding the MRU is unlikely to impact upon our efficiency
to reclaim VM space (where we think a LRU model is best) as our
current strategy is to use random idle replacement first before doing
a search, and over time the use of softpinned 48b per-ppGTT is growing
(thereby eliminating any need to perform any eviction searches, in
theory at least).

It's a big change to eviction logic but also mostly affects GGTT which
is diminishing in significance. But we still probably need some real
world mmap_gtt users to benchmark and general 3d pre-ppgtt.

mmap_gtt is not really significant here as we use random replacement
almost exclusively for them.

You mean ggtt mmaps can be kicked out by someone doing random 
replacement, but mmap_gtt faults cannot necessarily kick out other ggtt 
vmas using random replacement. In which case today it falls back to LRU 
so I think there is still something to verify if we want to be 100% nice.

Challenge is knowing such workloads. From memory transcode jobs used to 
be quite heavy in this respect when there are many many contexts and 
each uses some large mmap_gtt. I know the message was move away from 
mmap_gtt, and I don't know if that has been done yet, but maybe there 
are other similar ones.

Or also GVT with it's reduced aperture space will run eviction more in 
its more constrained ggtt space. So anything running under there could 
show the effect as amplified.

Any workload that relies on thrashing is a lost cause more or less; we
can not implement an optimal eviction strategy (no fore knowledge) and
even MRU is suggested by some of the literature as not being a good
strategy for graphics (the evidence in games are that the MRU reused
surfaces in a frame are typically use-once whereas the older surfaces
are typically used at the start of each frame; it is those use once
surfaces that then distort the MRU into making a bad prediction).

Fwiw, you will never get a demo with a >GTT working set that isn't a
slideshow, simply because we don't have the memory bw :-p
(Those machines have ~6GB/s main memory bw, a few fold higher while
in the LLC cache, but not enough to sustain ~180GB/s for the example,
even ignoring all the other ratelimiting steps.)

I agree with that.

diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 02b83a5ed96c..6a3608398d2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -127,14 +127,10 @@ i915_gem_evict_something(struct i915_address_space *vm,
       struct drm_i915_private *dev_priv = vm->i915;
       struct drm_mm_scan scan;
       struct list_head eviction_list;
-     struct list_head *phases[] = {
-             &vm->inactive_list,
-             &vm->active_list,
-             NULL,
-     }, **phase;
       struct i915_vma *vma, *next;
       struct drm_mm_node *node;
       enum drm_mm_insert_mode mode;
+     struct i915_vma *active;
       int ret;
   
       lockdep_assert_held(&vm->i915->drm.struct_mutex);
@@ -170,17 +166,31 @@ i915_gem_evict_something(struct i915_address_space *vm,
        */
       if (!(flags & PIN_NONBLOCK))
               i915_retire_requests(dev_priv);
-     else
-             phases[1] = NULL;


There are comments around here referring to active/inactive lists which
will need updating/removing.

   search_again:
+     active = NULL;
       INIT_LIST_HEAD(&eviction_list);
-     phase = phases;
-     do {
-             list_for_each_entry(vma, *phase, vm_link)
-                     if (mark_free(&scan, vma, flags, &eviction_list))
-                             goto found;
-     } while (*++phase);
+     list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
+             if (i915_vma_is_active(vma)) {
+                     if (vma == active) {
+                             if (flags & PIN_NONBLOCK)
+                                     break;
+
+                             active = ERR_PTR(-EAGAIN);
+                     }
+
+                     if (active != ERR_PTR(-EAGAIN)) {
+                             if (!active)
+                                     active = vma;
+
+                             list_move_tail(&vma->vm_link, &vm->bound_list);
+                             continue;
+                     }
+             }
+
+             if (mark_free(&scan, vma, flags, &eviction_list))
+                     goto found;
+     }

This loop need a narrative to explain the process.

active/inactive doesn't cover it?

The comments still make sense to me explaining the code flow. There's
only the conceptual juggle that the two lists are combined into one.

I don't completely agree. There is talk of LRU and retirement order etc. 
And talking about active and inactive is just misleading when with this 
change it is bound/pinned/active. This is relating to pre-existing comments.

For this particular loop I don't want the reader to think too much. A 
short comment shouldn't be too big of an ask for such an important loop.

For instance "if (vma == active)". When does this happen? I see "active 
= vma; continue;", which will walk the the next vma. So it is not 
immediately obvious. Okay, the loop modifies the list. So maybe 
s/active/first_active_vma/ for more self-documentation. But it is also 
important to say why is the first active vma interesting. There is no 
ordering/grouping of active/inactive with the proposed patch so why is 
first active vma important? Maybe it is a way to say we have walked the 
whole list and didn't find anything but active vmas.. So my point, why 
not just say that in a comment instead of anyone in the future who will 
need to understand it has to spend a few extra minutes figuring it out.


@@ -996,10 +993,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
        * add the active reference first and queue for it to be dropped
        * *last*.
        */
-     if (!i915_gem_active_isset(active) && !vma->active_count++) {
-             list_move_tail(&vma->vm_link, &vma->vm->active_list);

It is not workable to bump it to the head of bound list here?

See the opening statement of intent: killing the list_move here is
something that appears on the profiles for everyone but very rarely used
(and never for the softpinned cases).

You mean in the commit message? I don't see anything about the cost of 
this list_move_tail there. But it would mean locking in the following 
patches. Does that create the cost? Or creates a locking inversion problem?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx