Re: [PATCH 2/3] drm/i915: Priority boost for locked waits

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 23 Jan 2017 11:56:09 +0000

On 23/01/2017 11:50, Chris Wilson wrote:
On Mon, Jan 23, 2017 at 11:41:03AM +0000, Tvrtko Ursulin wrote:

On 23/01/2017 10:51, Chris Wilson wrote:
On Mon, Jan 23, 2017 at 10:43:10AM +0000, Tvrtko Ursulin wrote:
@@ -3285,6 +3291,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
		ret = i915_gem_object_wait(obj,
					   I915_WAIT_INTERRUPTIBLE |
					   I915_WAIT_LOCKED |
+					   I915_WAIT_PRIORITY |
					   I915_WAIT_ALL,
					   MAX_SCHEDULE_TIMEOUT,
					   NULL);

As mentioned before, is this not a concern? Is it not letting any
userspace boost their prio to max by just calling set cache level
after execbuf?

Not any more, set-cache-ioctl now does an explicit unlocked wait first
before hitting this wait. Also, the likely cause is though page-flip
after execbuf on a fresh bo, which is a stall we don't want.

Ok I've missed that change.

--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2158,7 +2158,9 @@ static int wait_for_space(struct drm_i915_gem_request *req, int bytes)
		return -ENOSPC;

	timeout = i915_wait_request(target,
-				    I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				    I915_WAIT_INTERRUPTIBLE |
+				    I915_WAIT_LOCKED |
+				    I915_WAIT_PRIORITY,
				    MAX_SCHEDULE_TIMEOUT);

This one also look worrying unless I am missing something. Allowing
clients who fill the ring to promote their priority?

Yes. They only boost priority for very, very old requests and more
importantly these clients are now stalling the entire *system* and not
just themselves anymore. So there is an implicit priority inversion
through struct_mutex. The only long term solution is avoiding
inter-client locks - we still may have inversion on any shared resource,
most likely objects, but we can at least reduce the contention by
splitting and avoid struct_mutex.

How do you know they are stalling the entire system - haven't they
just filled up their ringbuff? So the target request will be one of
theirs.

struct_mutex is our BKL. I view anything taking it with suspicion,
because it stops other clients, pageflips, eventually everything.

Once scheduler is able to do fair timeslicing or something,
especially then we should not allow clients to prioritise themselves
by just filling their ringbuf.

That is still missing the impact on the system of holding struct_mutex
for any period of time.

True, but I don't think priority boosting is a solution for that. I 
think we should rather think about approaches where the clients who fill 
up their ringbufs wait outside the struct_mutex then.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx