Re: [PATCH 4/4] drm/i915: Late request cancellations are harmful

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/04/16 10:27, Chris Wilson wrote:
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

This was a bit hard to figure out.

So we cannot unwind because once we set last_read_req we lose the data on what was the previous one, before this transaction started?

Intuitively I don't like the idea of sending unfinished stuff to the
GPU, when it failed at some random point in ring buffer preparation.

So I am struggling with reviewing this as I have in the previous round.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

This parts feels extra weird because in the non-execbuf cases we actually can cancel the transaction without any issues, correct?

Would middle-ground be to keep the cancellations for in-kernel submits, and for execbuf rewind the ringbuf so only request post-amble is sent to the GPU?

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
  drivers/gpu/drm/i915/i915_drv.h            |  2 --
  drivers/gpu/drm/i915/i915_gem.c            | 50 ++++++++++++------------------
  drivers/gpu/drm/i915/i915_gem_context.c    | 21 ++++++-------
  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 +++------
  drivers/gpu/drm/i915/intel_display.c       |  2 +-
  drivers/gpu/drm/i915/intel_lrc.c           |  4 +--
  drivers/gpu/drm/i915/intel_overlay.c       |  8 ++---
  7 files changed, 39 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a93e5dd4fa9a..f374db8de673 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2320,7 +2320,6 @@ struct drm_i915_gem_request {
  struct drm_i915_gem_request * __must_check
  i915_gem_request_alloc(struct intel_engine_cs *engine,
  		       struct intel_context *ctx);
-void i915_gem_request_cancel(struct drm_i915_gem_request *req);
  void i915_gem_request_free(struct kref *req_ref);
  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
  				   struct drm_file *file);
@@ -2872,7 +2871,6 @@ int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
  			     struct drm_file *file_priv);
  void i915_gem_execbuffer_move_to_active(struct list_head *vmas,
  					struct drm_i915_gem_request *req);
-void i915_gem_execbuffer_retire_commands(struct i915_execbuffer_params *params);
  int i915_gem_ringbuffer_submission(struct i915_execbuffer_params *params,
  				   struct drm_i915_gem_execbuffer2 *args,
  				   struct list_head *vmas);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1c3ff56594d6..42227495803f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2753,7 +2753,8 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
  		 * fully prepared. Thus it can be cleaned up using the proper
  		 * free code.
  		 */
-		i915_gem_request_cancel(req);
+		intel_ring_reserved_space_cancel(req->ringbuf);
+		i915_gem_request_unreference(req);
  		return ret;
  	}

@@ -2790,13 +2791,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
  	return err ? ERR_PTR(err) : req;
  }

-void i915_gem_request_cancel(struct drm_i915_gem_request *req)
-{
-	intel_ring_reserved_space_cancel(req->ringbuf);
-
-	i915_gem_request_unreference(req);
-}
-
  struct drm_i915_gem_request *
  i915_gem_find_active_request(struct intel_engine_cs *engine)
  {
@@ -3410,12 +3404,9 @@ int i915_gpu_idle(struct drm_device *dev)
  				return PTR_ERR(req);

  			ret = i915_switch_context(req);
-			if (ret) {
-				i915_gem_request_cancel(req);
-				return ret;
-			}
-
  			i915_add_request_no_flush(req);
+			if (ret)
+				return ret;

Looks like with this it could execute the context switch on the GPU but not update the engine->last_context in do_switch().

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux