On Tue, Aug 02, 2016 at 10:42:47AM +0100, Dave Gordon wrote: > On 01/08/16 17:32, Chris Wilson wrote: > >On Mon, Aug 01, 2016 at 05:28:34PM +0300, Joonas Lahtinen wrote: > >>On ma, 2016-08-01 at 11:15 +0100, Chris Wilson wrote: > >>>On Mon, Aug 01, 2016 at 01:07:55PM +0300, Joonas Lahtinen wrote: > >>>> > >>>>On ma, 2016-08-01 at 10:10 +0100, Chris Wilson wrote: > >>>>> > >>>>>Space reservation is already safe with respect to the ring->size > >>>>>modulus, but hardware only expects to see values in the range > >>>>>0...ring->size-1 (inclusive) and so requires the modulus to prevent us > >>>>>writing the value ring->size instead of 0. As this is only required for > >>>>>the register itself, we can defer the modulus to the register update and > >>>>>not perform it after every command packet. We keep the > >>>>>intel_ring_advance() around in the code to provide demarcation for the > >>>>>end-of-packet (which then can be compared against intel_ring_begin() as > >>>>>the number of dwords emitted must match the reserved space). > >>>>> > >>>>>Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>>Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >>>>>Cc: Dave Gordon <david.s.gordon@xxxxxxxxx> > >>>>>--- > >>>>> drivers/gpu/drm/i915/intel_lrc.c | 2 +- > >>>>> drivers/gpu/drm/i915/intel_ringbuffer.c | 6 ++++-- > >>>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 17 +++++++++++++---- > >>>>> 3 files changed, 18 insertions(+), 7 deletions(-) > >>>>> > >>>>>diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > >>>>>index bf42a66d6624..824f7efe4e64 100644 > >>>>>--- a/drivers/gpu/drm/i915/intel_lrc.c > >>>>>+++ b/drivers/gpu/drm/i915/intel_lrc.c > >>>>>@@ -373,7 +373,7 @@ static void execlists_update_context(struct drm_i915_gem_request *rq) > >>>>> struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt; > >>>>> uint32_t *reg_state = rq->ctx->engine[engine->id].lrc_reg_state; > >>>>> > >>>>>- reg_state[CTX_RING_TAIL+1] = rq->tail; > >>>>>+ reg_state[CTX_RING_TAIL+1] = intel_ring_offset(rq->ring, rq->tail); > >>>>> > >>>>> /* True 32b PPGTT with dynamic page allocation: update PDP > >>>>> * registers and point the unallocated PDPs to scratch page. > >>>>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > >>>>>index 3142085b5cc0..21d5e8209400 100644 > >>>>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c > >>>>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > >>>>>@@ -1718,7 +1718,8 @@ static void i9xx_submit_request(struct drm_i915_gem_request *request) > >>>>> { > >>>>> struct drm_i915_private *dev_priv = request->i915; > >>>>> > >>>>>- I915_WRITE_TAIL(request->engine, request->tail); > >>>>>+ I915_WRITE_TAIL(request->engine, > >>>>>+ intel_ring_offset(request->ring, request->tail)); > >>>>> } > >>>>> > >>>>> static void > >>>>>@@ -2505,7 +2506,8 @@ static void gen6_bsd_submit_request(struct drm_i915_gem_request *request) > >>>>> DRM_ERROR("timed out waiting for the BSD ring to wake up\n"); > >>>>> > >>>>> /* Now that the ring is fully powered up, update the tail */ > >>>>>- I915_WRITE_FW(RING_TAIL(request->engine->mmio_base), request->tail); > >>>>>+ I915_WRITE_FW(RING_TAIL(request->engine->mmio_base), > >>>>>+ intel_ring_offset(request->ring, request->tail)); > >>>>> POSTING_READ_FW(RING_TAIL(request->engine->mmio_base)); > >>>>> > >>>>> /* Let the ring send IDLE messages to the GT again, > >>>>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > >>>>>index 14d2ea36fb88..9ac96ddb01ee 100644 > >>>>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.h > >>>>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > >>>>>@@ -460,14 +460,23 @@ static inline void intel_ring_emit_reg(struct intel_ring *ring, i915_reg_t reg) > >>>>> > >>>>> static inline void intel_ring_advance(struct intel_ring *ring) > >>>>> { > >>>>>+ /* Dummy function. > >>>>>+ * > >>>>>+ * This serves as a placeholder in the code so that the reader > >>>>>+ * can compare against the preceding intel_ring_begin() and > >>>>>+ * check that the number of dwords emitted matches the space > >>>>>+ * reserved for the command packet (i.e. the value passed to > >>>>>+ * intel_ring_begin()). > >>>>>+ */ > >>>>>+} > >>>>>+ > >>>>>+static inline u32 intel_ring_offset(struct intel_ring *ring, u32 value) > >>>>>+{ > >>>>> /* The modulus is required so that we avoid writing > >>>>> * request->tail == ring->size, rather than the expected 0, > >>>>> * into the RING_TAIL register as that can cause a GPU hang. > >>>>>- * As this is only strictly required for the request->tail, > >>>>>- * and only then as we write the value into hardware, we can > >>>>>- * one day remove the modulus after every command packet. > >>>>> */ > >>>>>- ring->tail &= ring->size - 1; > >>>>>+ return value & (ring->size - 1); > >>>>> } > >>>>The comment seems outdated-ish as it speaks of modulus which is nowhere > >>>>to be seen. I'd speak of 'masking'. With that, > >>>The operation is a modulus. The common optimisation for moduli with > >>>a power-of-two divisor being applied. > >> > >>Yes, I'm perfectly aware of that, but as discussed in IRC, that > >>information would be good to be local. Modulus in this case a trick to > >>achieve value == ring->size ? 0 : value; Which would again be more > >>informative, we do not expect wrap, but we have one special value. > > > >Ditch all the verbage, and go with the single line: > > /* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */ > >that just fits into 80cols. > > > >Keep it to why not how and we need not argue about language. > >-Chris > > Well yes, except this function isn't writing anything to anywhere, > so it seems rather disconnected from the point of use. Perhaps > > /* ensure value (written to TAIL) is strictly less than ring->size */ > > which is even shorter :) > > BTW, a completely different way to avoid this problem would just be > to ensure the packet never finished at the last word of the ring. > Setting effective_size = size-1 would do just that! > > If we set effective_size < size for engines that don't like TAIL == > ring->size (as well as the ones that don't like the last cacheline) > we could then remove the don't-wrap-midsequence restriction from the > others, which would make better use of the ring space :) We can't just remove that as we still have the limitation that we can't split instructions, and the intel_ring_begin() is all the information we have at that point regarding instruction length. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx