On Fri, Jan 22, 2016 at 02:51:50PM +0000, Dave Gordon wrote: > On 22/01/16 10:28, Mika Kuoppala wrote: > >Dave Gordon <david.s.gordon@xxxxxxxxx> writes: > > > >>Existing code did a kunmap() on the wrong page, and didn't account for the > >>fact that the HWSP and the default context are different offsets within the > >>same GEM object. > >> > > > >Just to note here that usually we try to split bug fixes early in the > >series with minimal code changes. This helps cherrypickups > >to fixes/stable. And also lessens the probability that we accidentally > >revert bugfix when some other problem occurs with the patch. > > OK, let's drop this one from the series (they're orthogonal code changes, > only related by the fact they're all to do with init/fini), and I'll > resubmit it separately. > > Please continue this series from 2/4 ... > > >>This patch improves symmetry by creating lrc_teardown_hardware_status_page() > >>to complement lrc_setup_hardware_status_page(), making sure that they both > >>take the same argument (pointer to engine). They encapsulate all the details > >>of the relationship between the default context and the status page, so the > >>rest of the LRC code doesn't have to know about it. > >> > >>Signed-off-by: Dave Gordon <david.s.gordon@xxxxxxxxx> > >>--- > >> drivers/gpu/drm/i915/intel_lrc.c | 57 ++++++++++++++++++++++++++++------------ > >> 1 file changed, 40 insertions(+), 17 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > >>index 73d4347..3914120 100644 > >>--- a/drivers/gpu/drm/i915/intel_lrc.c > >>+++ b/drivers/gpu/drm/i915/intel_lrc.c > >>@@ -226,9 +226,8 @@ enum { > >> #define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT 0x17 > >> > >> static int intel_lr_context_pin(struct drm_i915_gem_request *rq); > >>-static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring, > >>- struct drm_i915_gem_object *default_ctx_obj); > >>- > >>+static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring); > >>+static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring); > >> > >> /** > >> * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists > >>@@ -1536,8 +1535,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) > >> struct drm_i915_private *dev_priv = dev->dev_private; > >> u8 next_context_status_buffer_hw; > >> > >>- lrc_setup_hardware_status_page(ring, > >>- dev_priv->kernel_context->engine[ring->id].state); > >>+ lrc_setup_hardware_status_page(ring); > >> > >> I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask)); > >> I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff); > >>@@ -1992,10 +1990,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring) > >> i915_cmd_parser_fini_ring(ring); > >> i915_gem_batch_pool_fini(&ring->batch_pool); > >> > >>- if (ring->status_page.obj) { > >>- kunmap(sg_page(ring->status_page.obj->pages->sgl)); > >>- ring->status_page.obj = NULL; > >>- } > >>+ /* Status page should have gone already */ > >>+ WARN_ON(ring->status_page.page_addr); > >>+ WARN_ON(ring->status_page.obj); > >> > >> ring->disable_lite_restore_wa = false; > >> ring->ctx_desc_template = 0; > >>@@ -2434,6 +2431,11 @@ void intel_lr_context_free(struct intel_context *ctx) > >> continue; > >> > >> if (ctx == ctx->i915->kernel_context) { > >>+ /* > >>+ * The HWSP is part of the default context > >>+ * object in LRC mode. > >>+ */ > >>+ lrc_teardown_hardware_status_page(ringbuf->ring); > >> intel_unpin_ringbuffer_obj(ringbuf); > >> i915_gem_object_ggtt_unpin(ctx_obj); > >> } > >>@@ -2482,24 +2484,45 @@ uint32_t intel_lr_context_size(struct intel_engine_cs *ring) > >> return ret; > >> } > >> > >>-static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring, > >>- struct drm_i915_gem_object *default_ctx_obj) > >>+static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring) > >> { > >>- struct drm_i915_private *dev_priv = ring->dev->dev_private; > >>+ struct drm_i915_private *dev_priv = to_i915(ring->dev); > >>+ struct intel_context *dctx = dev_priv->kernel_context; > >>+ struct drm_i915_gem_object *dctx_obj = dctx->engine[ring->id].state; > >>+ u64 dctx_addr = i915_gem_obj_ggtt_offset(dctx_obj); > >> struct page *page; > >> > >>- /* The HWSP is part of the default context object in LRC mode. */ > >>- ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj) > >>- + LRC_PPHWSP_PN * PAGE_SIZE; > >>- page = i915_gem_object_get_page(default_ctx_obj, LRC_PPHWSP_PN); > >>+ /* > >>+ * The HWSP is part of the default context object in LRC mode. > >>+ * Note that it doesn't contribute to the refcount! > >>+ */ > >>+ page = i915_gem_object_get_page(dctx_obj, LRC_PPHWSP_PN); > >> ring->status_page.page_addr = kmap(page); > >>- ring->status_page.obj = default_ctx_obj; > >>+ ring->status_page.gfx_addr = dctx_addr + LRC_PPHWSP_PN * PAGE_SIZE; > >>+ ring->status_page.obj = dctx_obj; > >> > >> I915_WRITE(RING_HWS_PGA(ring->mmio_base), > >> (u32)ring->status_page.gfx_addr); > >> POSTING_READ(RING_HWS_PGA(ring->mmio_base)); > >> } > >> > >>+/* This should be called before the default context is destroyed */ > >>+static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring) > >>+{ > >>+ struct drm_i915_gem_object *dctx_obj = ring->status_page.obj; > >>+ struct page *page; > >>+ > >>+ WARN_ON(!dctx_obj); > >>+ > >>+ if (ring->status_page.page_addr) { > >>+ page = i915_gem_object_get_page(dctx_obj, LRC_PPHWSP_PN); > >>+ kunmap(page); > >>+ ring->status_page.page_addr = NULL; > >>+ } > >>+ > > > >As the page is always kmapped along with setting of status_page.obj, > >I fail to see why we need the check here for the page_addr. > > The WARN_ON() tells the developer something unexpected happened. > > The page_addr test avoids doing something that may cause an OOPS so that the > driver can continue (to unload!). > > IMHO, teardown code *in particular* should be written to anticipate being > called in all sorts of inconsistent states, because a common error handling > strategy is just to say, > > oops, something bad happened at some stage of setup! > let's tear down everything and pass the error back > > which is certainly easier than tracking exactly where the error occurred and > undoing only the steps known to have succeeded -- that leads to lots of > "goto err_1", "goto err_2", etc :( Russian dolls error code unwinding is pretty much standard kernel practice. And since it's also mostly dead code (well in practice) it's not too much of a validation horror show. This will likely change a lot if we start using EPROBE_DEFER in earnest. This comment is just for context, i.e. I think this is totally fine. It's a bit fragile, but you'll be fighting against a _lot_ of inertia with this ;-) Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx