Quoting Tvrtko Ursulin (2018-11-13 10:24:43) > > On 09/11/2018 11:44, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-11-09 07:30:34) > >> > >> On 08/11/2018 16:48, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-11-08 16:23:08) > >>>> > >>>> On 08/11/2018 08:17, Chris Wilson wrote: > >>>>> Ignore trying to shrink from i915 if we fail to acquire the struct_mutex > >>>>> in the shrinker while performing direct-reclaim. The trade-off being > >>>>> (much) lower latency for non-i915 clients at an increased risk of being > >>>>> unable to obtain a page from direct-reclaim without hitting the > >>>>> oom-notifier. The proviso being that we still keep trying to hard > >>>>> obtain the lock for oom so that we can reap under heavy memory pressure. > >>>>> > >>>>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>> --- > >>>>> drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++++++++++------------- > >>>>> 1 file changed, 11 insertions(+), 13 deletions(-) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>>>> index ea90d3a0d511..d461f458f4af 100644 > >>>>> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>>>> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c > >>>>> @@ -36,7 +36,9 @@ > >>>>> #include "i915_drv.h" > >>>>> #include "i915_trace.h" > >>>>> > >>>>> -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) > >>>>> +static bool shrinker_lock(struct drm_i915_private *i915, > >>>>> + unsigned int flags, > >>>>> + bool *unlock) > >>>>> { > >>>>> switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) { > >>>>> case MUTEX_TRYLOCK_RECURSIVE: > >>>>> @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock) > >>>>> > >>>>> case MUTEX_TRYLOCK_FAILED: > >>>>> *unlock = false; > >>>>> - preempt_disable(); > >>>>> - do { > >>>>> - cpu_relax(); > >>>>> - if (mutex_trylock(&i915->drm.struct_mutex)) { > >>>>> - *unlock = true; > >>>>> - break; > >>>>> - } > >>>>> - } while (!need_resched()); > >>>>> - preempt_enable(); > >>>>> + if (flags & I915_SHRINK_ACTIVE) { > >>>> > >>>> So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink > >>>> in the normal case (direct reclaim?) or oom, we bail out on the first > >>>> sign of struct mutex contention. Doesn't this make our shrinker much > >>>> less effective at runtime and why is that OK? > >>> > >>> As I said, it's a tradeoff between blocking others for _several_ > >>> _seconds_ and making no progress and returning immediately and making no > >>> progress. My argument is along the lines of if direct-reclaim is running > >>> in another process and something else is engaged in the driver hopefully > >>> the driver will be cleaning up as it goes along or else what remains is > >>> active and won't be reaped anyway. If direct reclaim is failing, the > >>> delay before trying the oom path is insignificant. > >> > >> What was the rationale behind busy looping there btw? > > > > Emulating the optimistic spin for mutex (my patches to expose it from > > kernel/locking were kept hidden for public decency). My thinking was the > > exact opposite to this patch, that direct reclaim was of paramount > > importance and spending the time to try and ensure we grabbed the > > struct_mutex to search for some pages to free was preferable. > > > > It's just on the basis of looking at the actual syslatency and realising > > the cause is this spinner, I want to swing the axe in other direction. > > > > (There's probably a compromise, but honestly I'd prefer to sell the > > struct_mutex free version of the shrinker first :) > > > >> Compared to > >> perhaps an alternative of micro-sleeps and trying a few times? I know it > >> would be opposite from what this patch is trying to achieve, I Just > >> don't had a good judgment on what makes most sense for the shrinker. Is > >> it better to perhaps try a little bit harder instead of giving up > >> immediately, but try a little bit harder in a softer way? Or that ends > >> up blocking the callers and has the same effect of making no progress? > > > > Exactly. We can definitely measure the impact of the spinner on > > unrelated processes, but detecting the premature allocation failure is > > harder (we wait for more dmesg-warns). The compromise that I've tried to > > reach here is that if direct-reclaim isn't enough, then we should still > > try hard to grab the struct_mutex. (That leaves __GFP_RETRY_MAYFAIL > > vulnerable to not shrinking i915, but a worthwhile compromise as it's > > allowed to fail?) > > > >>>> Or in other words, for what use cases, tests or benchmark was the > >>>> existing approach of busy looping a problem? > >>> > >>> Do something like 'find / -exec cat' while running i915 and see how long > >>> you have to wait for a khungtaskd :| > >> > >> I couldn't reproduce anything strange with this. Assuming you meant > >> something like -exec cat { } \; >dev/null. > >> > >> Either case I think explanations like this should go into the commit > >> message. > > > > Weird, I spent so long over the last few weeks talking about the impact > > on gem_syslatency, I thought it was mentioned here. > > > >>> gem_syslatency reports max latencies of over 600s, and I'm sure it's > >>> pretty much unbounded. It's bad enough that I expect we are the cause of > >>> significant hitching (mainly in other tasks) on the desktop running at > >>> memory capacity. > >> > >> Running gem_syslatency in parallel to the find, or gem_syslatency -b in > >> parallel or not did not do anything. > > > > gem_syslatency -b -m > > > > +------------------------------------------------------------------------+ > > | x | > > | x | > > . > > . > > . > > | x | > > | x xxx xx x x x x x| > > ||_____M_A_______| | > > +------------------------------------------------------------------------+ > > N Min Max Median Avg Stddev > > x 120 2284928 6.752085e+08 3385097 20825362 80352645 > > > > Unpatched, and similar have been observed on ivb/byt/bdw/bsw/skl/kbl and > > while we were fighting ksoftirqd. 675s maximum latency for a RT thread to wake up. > > > > Patched, max is around 261ms and order of magnitude better than the best > > previously! > > Yes I have reproduced the ~20x better max latency myself. Mean latency > stays roughly the same but has less variance. From that point of view > this is very good. > > > > >> Then I tries gem_shrink but that doesn't seem to be making any progress > >> with or without the patch. But it's not even hitting the i915 hard, but > >> perhaps I need to turn off lockdep.. Which would be a bummer since I > >> wanted to have a lockdep to check the next bit.. > > > > gem_shrink never has completed... It's not quite the right test for this > > as we need to hit the shrinker & oom on both i915 and non-i915 paths to > > try and hit all the different lock chains. > > Ok. :) > > > > >>>>> + mutex_lock_nested(&i915->drm.struct_mutex, > >>>>> + I915_MM_SHRINKER); > >>>> > >>>> _nested is needed since abandoning trylock would otherwise cause lockdep > >>>> issues? But is it really safe? I don't know.. how can it be? It is > >>>> guaranteed to be a different process here otherwise the result wouldn't > >>>> be MUTEX_TRYLOCK_FAILED. > >>> > >>> The easy option was to only force the mutex_lock for kswapd. However, > >>> noting that we do need to forcibly shrink before oom, I opted for the > >>> more inclusive attempt to take it in both. For the oom approach, my only > >>> handwaving is we shouldn't be nested under oom serialisation and so > >>> avoid an ABBA deadlock. It's survived some oom abuse, but the machine > >>> still becomes unresponsive (but pingable) eventually (as it does before > >>> the patch). > >> > >> ...which I completely did not understand! :( > >> > >> When you say kswapd - that means mutex->owner != current in the shrinker > >> and otherwise it is direct reclaim? > >> > >> If true, how is the nested annotation correct for the kswapd case? > > > > We have 3 paths, > > direct-reclaim / oom from under i915 + struct_mutex > > direct-reclaim / oom from elsewher > kswapd > > Is the kernel guaranteeing only one reclaim path at a time can happen? > In other words if it is doing direct reclaim, kswapd won't enter the > same shrinking notifier and vice versa? direct-reclaim is parallelised; the shrink_slab is only guarded by a read lock (to prevent the shrinker lists being modified). Multiple callers into direct-reclaim is allowed, and the caller is not always guaranteed of obtaining the pages they themselves freed. Afaik, there's no lock there to cause contention and inversion for us. > > The nested annotation doesn't mean that the lock is nested per-se, it > > just means that here it is acting as a different subclass of the lock. > > Since the kswapd path can't have struct_mutex, it is safe from being > > confused. The complication is if there is serialisation on > > direct-reclaim (which I think can be run concurrently, i.e. no > > mutex/semaphore nesting) or oom. For oom, there is a lock but afaict the > > guard is just a trylock so shouldn't cause an ABBA deadlock if we oom > > under struct_mutex while fighting an oom in another process. > > When you say direct reclaim can run concurrently, in that case we would > have one thread potentially holding the struct_mutex and entering > shrinker. Second thread would grab the mutex in the shrinker with the > nested annotation. But it is not nested but overlapping. And you say > nested is effectively a hack to enable different lock classes. > > Actually in this case it is the same situation for any two simultaneous > threads in direct reclaim, even if none holds struct mutex on entry. But > in this case the lockdep annotation is also not needed. > > Hm.. maybe to turn the question on it's head.. if you had a normal > mutex_lock in there, what deadlock would lockdep notice? The direct-reclaim paths are explicitly marked up so as to avoid taking a mutex inside the shrinker that may be held while allocating. That's the contention we avoid by special casing the recursive mutex; but it also means we can't use a plain mutex_lock() for the cases where we are happy that we won't deadlock. Hence why previously it was a trylock to avoid the same lockdep warning. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx