Quoting Daniel Vetter (2018-09-12 15:47:21) > On Wed, Sep 12, 2018 at 3:42 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > Quoting Tvrtko Ursulin (2018-09-12 14:34:16) > >> > >> On 12/09/2018 12:11, Chris Wilson wrote: > >> > If we try and fail to allocate a i915_request, we apply some > >> > backpressure on the clients to throttle the memory allocations coming > >> > from i915.ko. Currently, we wait until completely idle, but this is far > >> > too heavy and leads to some situations where the only escape is to > >> > declare a client hung and reset the GPU. The intent is to only ratelimit > >> > the allocation requests, so we need only wait for a jiffie before using > >> > the normal direct reclaim. > >> > > >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 > >> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >> > --- > >> > drivers/gpu/drm/i915/i915_request.c | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > >> > index 09ed48833b54..588bc5a4d18b 100644 > >> > --- a/drivers/gpu/drm/i915/i915_request.c > >> > +++ b/drivers/gpu/drm/i915/i915_request.c > >> > @@ -736,7 +736,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > >> > ret = i915_gem_wait_for_idle(i915, > >> > I915_WAIT_LOCKED | > >> > I915_WAIT_INTERRUPTIBLE, > >> > - MAX_SCHEDULE_TIMEOUT); > >> > + 1); > >> > if (ret) > >> > goto err_unreserve; > >> > > >> > > >> > >> What is the remaining value of even trying to wait for idle, instead of > >> maybe just i915_request_retire and sleep for a jiffie? The intention > >> would potentially read clearer since it is questionable there is any > >> relationship with idle and rate limiting clients. In fact, now that I > >> think of it, waiting for idle is a nice way to starve an unlucky client > >> forever. > > > > Better to starve the unlucky client, than to allow the entire system to > > grind to a halt. > > > > One caveat to using RCU is that it is our responsibility to apply > > backpressure as none is applied by the vm. > > So instead of 1 jiffie, should we wait for 1 rcu grace period? My > understanding is that under very heavy load these can be extended > (since batching is more effective for throughput, if you don't run out > of memory). Just a random comment from the sidelines really :-) There's (essentially) a wait for that later ;) But a long time ago Paul did write a missive explaining that there should be some use of cond_synchronize_rcu() to provide the rate-limiting, but I've never been sure of a way to plug in the right figures. Do we say if the timeline does more than two RCU allocations within the same grace period it should be throttled? A light allocation failure has always seemed to be a sensible trigger to start worrying. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx