Quoting Tvrtko Ursulin (2019-12-18 12:49:26) > > On 18/12/2019 09:40, Chris Wilson wrote: > > When doing our global park, we like to be a good citizen and shrink our > > slab caches (of which we have quite a few now), but each > > kmem_cache_shrink() incurs a stop_machine() and so ends up being quite > > expensive, causing machine-wide stalls. While ideally we would like to > > throw away unused pages in our slab caches whenever it appears that we > > are idling, doing so will require a much cheaper mechanism. In the > > meantime use a delayed worked to impose a rate-limit that means we have > > to have been idle for more than 2 seconds before we start shrinking. > > I was thinking about this yesterday, while looking at rapid park-unpark > cycles! > > > References: https://gitlab.freedesktop.org/drm/intel/issues/848 > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_globals.c | 53 ++++++++++++++++++++++++----- > > 1 file changed, 44 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c > > index be127cd28931..3aa213684293 100644 > > --- a/drivers/gpu/drm/i915/i915_globals.c > > +++ b/drivers/gpu/drm/i915/i915_globals.c > > @@ -20,7 +20,10 @@ static LIST_HEAD(globals); > > static atomic_t active; > > static atomic_t epoch; > > static struct park_work { > > - struct rcu_work work; > > + struct delayed_work work; > > + struct rcu_head rcu; > > + unsigned long flags; > > +#define PENDING 0 > > int epoch; > > } park; > > > > @@ -37,11 +40,33 @@ static void i915_globals_shrink(void) > > global->shrink(); > > } > > > > +static void __i915_globals_grace(struct rcu_head *rcu) > > +{ > > + /* Ratelimit parking as shrinking is quite slow */ > > + schedule_delayed_work(&park.work, round_jiffies_up_relative(2 * HZ)); > > +} > > + > > +static void __i915_globals_queue_rcu(void) > > +{ > > + park.epoch = atomic_inc_return(&epoch); > > + if (!atomic_read(&active)) { > > + init_rcu_head(&park.rcu); > > Do we need to do init/destroy more than once? I think once on driver > load/exit would be more correct since the head is statically allocated. I did on each fresh invocation to have a better chance of detecting wrong lifetimes. Because the first patch did call call_rcu() while still active, I thought by having the debugobject init it would give us the early warning that it was still alive. Similarly by calling destroy_rcu as we transition to the delayed_work should catch if we happen to be still active on rcu lists. That was all I was thinking, it would give us better debug information. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx