On Wed, Sep 12, 2018 at 3:42 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > Quoting Tvrtko Ursulin (2018-09-12 14:34:16) >> >> On 12/09/2018 12:11, Chris Wilson wrote: >> > If we try and fail to allocate a i915_request, we apply some >> > backpressure on the clients to throttle the memory allocations coming >> > from i915.ko. Currently, we wait until completely idle, but this is far >> > too heavy and leads to some situations where the only escape is to >> > declare a client hung and reset the GPU. The intent is to only ratelimit >> > the allocation requests, so we need only wait for a jiffie before using >> > the normal direct reclaim. >> > >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680 >> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> >> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> >> > --- >> > drivers/gpu/drm/i915/i915_request.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >> > index 09ed48833b54..588bc5a4d18b 100644 >> > --- a/drivers/gpu/drm/i915/i915_request.c >> > +++ b/drivers/gpu/drm/i915/i915_request.c >> > @@ -736,7 +736,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) >> > ret = i915_gem_wait_for_idle(i915, >> > I915_WAIT_LOCKED | >> > I915_WAIT_INTERRUPTIBLE, >> > - MAX_SCHEDULE_TIMEOUT); >> > + 1); >> > if (ret) >> > goto err_unreserve; >> > >> > >> >> What is the remaining value of even trying to wait for idle, instead of >> maybe just i915_request_retire and sleep for a jiffie? The intention >> would potentially read clearer since it is questionable there is any >> relationship with idle and rate limiting clients. In fact, now that I >> think of it, waiting for idle is a nice way to starve an unlucky client >> forever. > > Better to starve the unlucky client, than to allow the entire system to > grind to a halt. > > One caveat to using RCU is that it is our responsibility to apply > backpressure as none is applied by the vm. So instead of 1 jiffie, should we wait for 1 rcu grace period? My understanding is that under very heavy load these can be extended (since batching is more effective for throughput, if you don't run out of memory). Just a random comment from the sidelines really :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx