Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/10/2016 09:17, Chris Wilson wrote:
On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
On 17/10/2016 15:10, Chris Wilson wrote:
When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.

We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
  1 file changed, 28 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1d02e74ce62d..22342ad0e07f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -551,20 +551,6 @@ repeat:
  	return 0;
  }
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
-	unsigned long active = i915_gem_object_get_active(obj);
-	int idx;
-
-	for_each_active(active, idx) {
-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
-					     &obj->base.dev->struct_mutex))
-			return false;
-	}
-
-	return true;
-}
-
  static int
  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
  				   struct eb_vmas *eb,
@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
  		return -EINVAL;
  	}
-	/* We can't wait for rendering with pagefaults disabled */
-	if (pagefault_disabled() && !object_is_idle(obj))
-		return -EFAULT;
-
  	ret = relocate_entry(obj, reloc, cache, target_offset);
  	if (ret)
  		return ret;
@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
  	remain = entry->relocation_count;
  	while (remain) {
  		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
+		unsigned long unwritten;
+		unsigned int count;
+
+		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
  		remain -= count;
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+		/* This is the fast path and we cannot handle a pagefault
+		 * whilst holding the struct mutex lest the user pass in the
+		 * relocations contained within a mmaped bo. For in such a case
+		 * we, the page fault handler would call i915_gem_fault() and
+		 * we would try to acquire the struct mutex again. Obviously
+		 * this is bad and so lockdep complains vehemently.
+		 */
+		pagefault_disable();
+		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
+		pagefault_enable();
+		if (unwritten) {
  			ret = -EFAULT;
  			goto out;
  		}
@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
  			if (ret)
  				goto out;
-			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset,
-				       &user_relocs->presumed_offset)) {
-				ret = -EFAULT;
-				goto out;
+			if (r->presumed_offset != offset) {
+				/* Copying back to the user is allowed to fail.
+				 * The information passed back is a hint as
+				 * to the final location. If the copy_to_user
+				 * fails after a successful copy_from_user,
+				 * it must be a readonly location and so
+				 * we presume the user knows what they are
+				 * doing!
+				 */
+				pagefault_disable();
+				__put_user(r->presumed_offset,
+					   &user_relocs->presumed_offset);
+				pagefault_enable();
Why is a good idea to ignore potential errors here?
Wrong question: why did we think it a good idea to ignore success here?

How were we ignoring errors, I don't follow?

(a) it is safe to do so, and I can legitimately setup userspace to use
this

How what where? :)

(b) reporting an error after we have committed the change is broken
anyway.

You mean in the half way through cases?

Regards,

Tvrkto

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux