Re: [PATCH] drm/i915: Ratelimit i915_globals_park

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 18 Dec 2019 13:31:35 +0000

On 18/12/2019 12:57, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-12-18 12:49:26)

On 18/12/2019 09:40, Chris Wilson wrote:
When doing our global park, we like to be a good citizen and shrink our
slab caches (of which we have quite a few now), but each
kmem_cache_shrink() incurs a stop_machine() and so ends up being quite
expensive, causing machine-wide stalls. While ideally we would like to
throw away unused pages in our slab caches whenever it appears that we
are idling, doing so will require a much cheaper mechanism. In the
meantime use a delayed worked to impose a rate-limit that means we have
to have been idle for more than 2 seconds before we start shrinking.

I was thinking about this yesterday, while looking at rapid park-unpark
cycles!

References: https://gitlab.freedesktop.org/drm/intel/issues/848
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_globals.c | 53 ++++++++++++++++++++++++-----
   1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
index be127cd28931..3aa213684293 100644
--- a/drivers/gpu/drm/i915/i915_globals.c
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -20,7 +20,10 @@ static LIST_HEAD(globals);
   static atomic_t active;
   static atomic_t epoch;
   static struct park_work {
-     struct rcu_work work;
+     struct delayed_work work;
+     struct rcu_head rcu;
+     unsigned long flags;
+#define PENDING 0
       int epoch;
   } park;
   
@@ -37,11 +40,33 @@ static void i915_globals_shrink(void)
               global->shrink();
   }
   
+static void __i915_globals_grace(struct rcu_head *rcu)
+{
+     /* Ratelimit parking as shrinking is quite slow */
+     schedule_delayed_work(&park.work, round_jiffies_up_relative(2 * HZ));
+}
+
+static void __i915_globals_queue_rcu(void)
+{
+     park.epoch = atomic_inc_return(&epoch);
+     if (!atomic_read(&active)) {
+             init_rcu_head(&park.rcu);

Do we need to do init/destroy more than once? I think once on driver
load/exit would be more correct since the head is statically allocated.

I did on each fresh invocation to have a better chance of detecting
wrong lifetimes. Because the first patch did call call_rcu() while
still active, I thought by having the debugobject init it would give us
the early warning that it was still alive. Similarly by calling
destroy_rcu as we transition to the delayed_work should catch if we
happen to be still active on rcu lists.

That was all I was thinking, it would give us better debug information.

Double init emits a warning? Okay, that works as well then.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx