Re: [PATCH 3/7] drm/i915: Keep a count of requests submitted from userspace

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 9 Apr 2018 11:17:04 +0100

On 09/04/2018 10:25, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-09 10:11:53)

On 06/04/2018 21:17, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-04-05 13:39:19)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Keep a count of requests submitted from userspace and not yet runnable due
unresolved dependencies.

v2: Rename and move under the container struct. (Chris Wilson)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_request.c     | 3 +++
   drivers/gpu/drm/i915/intel_engine_cs.c  | 3 ++-
   drivers/gpu/drm/i915/intel_ringbuffer.h | 8 ++++++++
   3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5c01291ad1cc..152321655fe6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -640,6 +640,7 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
                  rcu_read_lock();
                  request->engine->submit_request(request);
                  rcu_read_unlock();
+               atomic_dec(&request->engine->request_stats.queued);

But we use atomic here? Might as well use atomic for
request_stats.runnable here as well?

I admit it can read a bit uneven.

For runnable I wanted to avoid another atomic by putting it under the
engine timeline lock.

But for queued I did not want to start taking the same lock when adding
a request.

Your proposal to make runnable atomic_t and move to submit_notify would
indeed simplify things, but at a cost of one more atomic in that path.
Perhaps the code path is heavy enough for one new atomic to be
completely hidden in it, and code simplification to win?

It also solidifies that we are moving from one counter to the next.
(There must be some common 64b cmpxchg for doing that!) Going from +1
locked operations to +2 here isn't the end of the world, but I can
certainly appreciated trying to keep the number down (especially for aux
information like stats).

now = atomic64_read(&stats.queued_runnable);
do {
	old = now;
	new_queued = upper_32_bits(old) - 1;
	new_runnable = lower_32_bits(old) + 1;
	now = atomic64_cmpxchg(&stats.queued_runnable,
				old, (new_runnable | (u64)new_queued << 32));
} while (now != old);

Hm don't know, have to be careful with these retry loops. More 
importantly I am not sure if it isn't an overkill.

Downside being that we either then use atomic64 throughout or we mix
atomic32/atomic64 knowing that we're on x86. (I feel like someone else
must have solved this problem in a much neater way, before they went to
per-cpu stats ;)

Is the winky implying you know who and where? :) We have three potential 
solutions now, even for if the winky is suggesting something.

For me it is still a choice between what I have versus simplifying the 
code paths by going another atomic_t.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx