On 31/05/2021 08:53, Daniel Vetter wrote:
On Thu, May 20, 2021 at 4:28 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
On Thu, May 20, 2021 at 08:35:14AM +0100, Matthew Auld wrote:
From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
The first tracepoint for a request is trace_dma_fence_init called before
we have associated the request with a device. The tracepoint uses
fence->ops->get_driver_name() as a pretty name, and as we try to report
the device name this oopses as it is then NULL. Support the early
tracepoint by reporting the DRIVER_NAME instead of the actual device
name.
Note that rq->engine remains during the course of request recycling
(SLAB_TYPESAFE_BY_RCU). For the physical engines, the pointer remains
valid, however a virtual engine may be destroyed after the request is
retired. If we process a preempt-to-busy completed request along the
virtual engine, we should make sure we mark the request as no longer
belonging to the virtual engine to remove the dangling pointers from the
tracepoint.
Why can't we assign the request beforehand? The idea behind these
tracepoints is that they actually match up, if trace_dma_fence_init is
different, then we're breaking that.
Ok I looked a bit more and pondered this a bit, and the initial
tracepoint is called from dma_fence_init, where we haven't yet set up
rq->engine properly. So that part makes sense, but should have a
bigger comment that explains this a bit more and why we can't solve
this in a neater way. Probably should also drop the unlikely(), this
isn't a performance critical path, ever.
The other changes thgouh feel like they should be split out into a
separate path, since they solve a conceptually totally different
issue: SLAB_TYPESAFE_BY_RCU recycling.
Hmm, I thought it all stems from having to tread very carefully around
SLAB_TYPESAFE_BY_RCU? If this were "normal" code, we would just allocate
the rq, initialise it properly, including the rq->engine, and only then
do the dma_fence_init? Or am I missing something?
I'm happy to split it though. And I think that bit at least fixes the
user reported issue I think.
And I'm honestly not sure about
that one whether it's even correct, there's another patch floating
around that sprinkles rcu_read_lock around some of these accesssors,
and that would be a breakage of dma_fence interaces where outside of
i915 rcu isn't required for this stuff. So imo should be split out,
and come with a wider analysis of what's going on there and why and
how exactly i915 works.
In generally SLAB_TYPESAFE_BY_RCU is extremely dangerous and I'm
frankly not sure we have the perf data (outside of contrived
microbenchmarks) showing that it's needed and justifies all the costs
it's encurring.
Right, I can try to search the git history.
-Daniel
-Daniel
Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring seqno")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Cc: Chintan M Patel <chintan.m.patel@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v5.7+
Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx>
---
.../drm/i915/gt/intel_execlists_submission.c | 20 ++++++++++++++-----
drivers/gpu/drm/i915/i915_request.c | 7 ++++++-
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de124870af44..75604e927d34 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3249,6 +3249,18 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
return &ve->base.execlists.default_priolist.requests;
}
+static void
+virtual_submit_completed(struct virtual_engine *ve, struct i915_request *rq)
+{
+ GEM_BUG_ON(!__i915_request_is_complete(rq));
+ GEM_BUG_ON(rq->engine != &ve->base);
+
+ __i915_request_submit(rq);
+
+ /* Remove the dangling pointer to the stale virtual engine */
+ WRITE_ONCE(rq->engine, ve->siblings[0]);
+}
+
static void rcu_virtual_context_destroy(struct work_struct *wrk)
{
struct virtual_engine *ve =
@@ -3265,8 +3277,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
old = fetch_and_zero(&ve->request);
if (old) {
- GEM_BUG_ON(!__i915_request_is_complete(old));
- __i915_request_submit(old);
+ virtual_submit_completed(ve, old);
i915_request_put(old);
}
@@ -3538,13 +3549,12 @@ static void virtual_submit_request(struct i915_request *rq)
/* By the time we resubmit a request, it may be completed */
if (__i915_request_is_complete(rq)) {
- __i915_request_submit(rq);
+ virtual_submit_completed(ve, rq);
goto unlock;
}
if (ve->request) { /* background completion from preempt-to-busy */
- GEM_BUG_ON(!__i915_request_is_complete(ve->request));
- __i915_request_submit(ve->request);
+ virtual_submit_completed(ve, ve->request);
i915_request_put(ve->request);
}
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 970d8f4986bb..aa124adb1051 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -61,7 +61,12 @@ static struct i915_global_request {
static const char *i915_fence_get_driver_name(struct dma_fence *fence)
{
- return dev_name(to_request(fence)->engine->i915->drm.dev);
+ struct i915_request *rq = to_request(fence);
+
+ if (unlikely(!rq->engine)) /* not yet attached to any device */
+ return DRIVER_NAME;
+
+ return dev_name(rq->engine->i915->drm.dev);
}
static const char *i915_fence_get_timeline_name(struct dma_fence *fence)
--
2.26.3
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch