Op 2021-03-11 om 22:22 schreef Jason Ekstrand: > First off, I'm just here asking questions right now trying to start > getting my head around some of this stuff. Feel free to ignore me or > tell me to go away if I'm being annoying. :-) > > On Thu, Mar 11, 2021 at 7:49 AM Maarten Lankhorst > <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote: >> Instead of sharing pages with breadcrumbs, give each timeline a >> single page. This allows unrelated timelines not to share locks >> any more during command submission. >> >> As an additional benefit, seqno wraparound no longer requires >> i915_vma_pin, which means we no longer need to worry about a >> potential -EDEADLK at a point where we are ready to submit. >> >> Changes since v1: >> - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle). >> - Extra check for completion in intel_read_hwsp(). >> Changes since v2: >> - Fix inconsistent indent in hwsp_alloc() (kbuild) >> - memset entire cacheline to 0. >> Changes since v3: >> - Do same in intel_timeline_reset_seqno(), and clflush for good measure. >> Changes since v4: >> - Use refcounting on timeline, instead of relying on i915_active. >> - Fix waiting on kernel requests. >> Changes since v5: >> - Bump amount of slots to maximum (256), for best wraparounds. >> - Add hwsp_offset to i915_request to fix potential wraparound hang. >> - Ensure timeline wrap test works with the changes. >> - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to >> fix a hang. >> Changes since v6: >> - Rename i915_request_active_offset to i915_request_active_seqno(), >> and elaborate the function. (tvrtko) >> >> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> >> Reviewed-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxx> #v1 >> Reported-by: kernel test robot <lkp@xxxxxxxxx> >> --- >> drivers/gpu/drm/i915/gt/gen2_engine_cs.c | 2 +- >> drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 8 +- >> drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 13 +- >> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 1 + >> drivers/gpu/drm/i915/gt/intel_gt_types.h | 4 - >> drivers/gpu/drm/i915/gt/intel_timeline.c | 422 ++++-------------- >> .../gpu/drm/i915/gt/intel_timeline_types.h | 17 +- >> drivers/gpu/drm/i915/gt/selftest_engine_cs.c | 5 +- >> drivers/gpu/drm/i915/gt/selftest_timeline.c | 83 ++-- >> drivers/gpu/drm/i915/i915_request.c | 4 - >> drivers/gpu/drm/i915/i915_request.h | 31 +- >> 11 files changed, 175 insertions(+), 415 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c >> index b491a64919c8..9646200d2792 100644 >> --- a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c >> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c >> @@ -143,7 +143,7 @@ static u32 *__gen2_emit_breadcrumb(struct i915_request *rq, u32 *cs, >> int flush, int post) >> { >> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma); >> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR); >> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR); >> >> *cs++ = MI_FLUSH; >> >> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c >> index ce38d1bcaba3..b388ceeeb1c9 100644 >> --- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c >> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c >> @@ -161,7 +161,7 @@ u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) >> PIPE_CONTROL_DC_FLUSH_ENABLE | >> PIPE_CONTROL_QW_WRITE | >> PIPE_CONTROL_CS_STALL); >> - *cs++ = i915_request_active_timeline(rq)->hwsp_offset | >> + *cs++ = i915_request_active_seqno(rq) | >> PIPE_CONTROL_GLOBAL_GTT; >> *cs++ = rq->fence.seqno; >> >> @@ -359,7 +359,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) >> PIPE_CONTROL_QW_WRITE | >> PIPE_CONTROL_GLOBAL_GTT_IVB | >> PIPE_CONTROL_CS_STALL); >> - *cs++ = i915_request_active_timeline(rq)->hwsp_offset; >> + *cs++ = i915_request_active_seqno(rq); >> *cs++ = rq->fence.seqno; >> >> *cs++ = MI_USER_INTERRUPT; >> @@ -374,7 +374,7 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs) >> u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) >> { >> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma); >> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR); >> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR); >> >> *cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX; >> *cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT; >> @@ -394,7 +394,7 @@ u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs) >> int i; >> >> GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma); >> - GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR); >> + GEM_BUG_ON(offset_in_page(rq->hwsp_seqno) != I915_GEM_HWS_SEQNO_ADDR); >> >> *cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB | >> MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX; >> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c >> index cac80af7ad1c..6b9c34d3ac8d 100644 >> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c >> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c >> @@ -338,15 +338,14 @@ static u32 preempt_address(struct intel_engine_cs *engine) >> >> static u32 hwsp_offset(const struct i915_request *rq) >> { >> - const struct intel_timeline_cacheline *cl; >> + const struct intel_timeline *tl; >> >> - /* Before the request is executed, the timeline/cachline is fixed */ >> + /* Before the request is executed, the timeline is fixed */ >> + tl = rcu_dereference_protected(rq->timeline, >> + !i915_request_signaled(rq)); > Why is Gen8+ different from Gen2/6 here? In particular, why not use > i915_request_active_timeline(rq) or, better yet, > i915_request_active_seqno()? The primary difference I see is that the > guard on the RCU is different but it's not immediately obvious to me > why this should be different between hardware generations. Also, > i915_request_active_seqno() returns a u32, but that could be fixed. The legacy rings different locking, and it was splatting on PROVE_RCU. I didn't want to weaken it, so I just put in a different condition for gen8 that's slightly weaker, but still better than the '1'. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx