On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote: > > On 17/10/2016 15:10, Chris Wilson wrote: > >When handling execbuf relocations, we play a delicate dance with > >pagefault. We first try to access the user pages underneath our > >struct_mutex. However, if those pages were inside a GEM object, we may > >trigger a pagefault and deadlock as i915_gem_fault() tries to > >recursively acquire struct_mutex. Instead, we choose to disable > >pagefaulting around the copy_from_user whilst inside the struct_mutex > >and handle the EFAULT by falling back to a copy outside the > >struct_mutex. > > > >We however presumed that disabling pagefaults would be expensive. It is > >just an operation on the local current task. Cheap enough that we can > >restrict the disable/enable to the critical section around the copy, and > >so avoid having to handle the atomic sections within the relocation > >handling itself. > > > >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > >--- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++----------------- > > 1 file changed, 28 insertions(+), 36 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > >index 1d02e74ce62d..22342ad0e07f 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > >@@ -551,20 +551,6 @@ repeat: > > return 0; > > } > >-static bool object_is_idle(struct drm_i915_gem_object *obj) > >-{ > >- unsigned long active = i915_gem_object_get_active(obj); > >- int idx; > >- > >- for_each_active(active, idx) { > >- if (!i915_gem_active_is_idle(&obj->last_read[idx], > >- &obj->base.dev->struct_mutex)) > >- return false; > >- } > >- > >- return true; > >-} > >- > > static int > > i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, > > struct eb_vmas *eb, > >@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, > > return -EINVAL; > > } > >- /* We can't wait for rendering with pagefaults disabled */ > >- if (pagefault_disabled() && !object_is_idle(obj)) > >- return -EFAULT; > >- > > ret = relocate_entry(obj, reloc, cache, target_offset); > > if (ret) > > return ret; > >@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma, > > remain = entry->relocation_count; > > while (remain) { > > struct drm_i915_gem_relocation_entry *r = stack_reloc; > >- int count = remain; > >- if (count > ARRAY_SIZE(stack_reloc)) > >- count = ARRAY_SIZE(stack_reloc); > >+ unsigned long unwritten; > >+ unsigned int count; > >+ > >+ count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc)); > > remain -= count; > >- if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) { > >+ /* This is the fast path and we cannot handle a pagefault > >+ * whilst holding the struct mutex lest the user pass in the > >+ * relocations contained within a mmaped bo. For in such a case > >+ * we, the page fault handler would call i915_gem_fault() and > >+ * we would try to acquire the struct mutex again. Obviously > >+ * this is bad and so lockdep complains vehemently. > >+ */ > >+ pagefault_disable(); > >+ unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0])); > >+ pagefault_enable(); > >+ if (unwritten) { > > ret = -EFAULT; > > goto out; > > } > >@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma, > > if (ret) > > goto out; > >- if (r->presumed_offset != offset && > >- __put_user(r->presumed_offset, > >- &user_relocs->presumed_offset)) { > >- ret = -EFAULT; > >- goto out; > >+ if (r->presumed_offset != offset) { > >+ /* Copying back to the user is allowed to fail. > >+ * The information passed back is a hint as > >+ * to the final location. If the copy_to_user > >+ * fails after a successful copy_from_user, > >+ * it must be a readonly location and so > >+ * we presume the user knows what they are > >+ * doing! > >+ */ > >+ pagefault_disable(); > >+ __put_user(r->presumed_offset, > >+ &user_relocs->presumed_offset); > >+ pagefault_enable(); > > Why is a good idea to ignore potential errors here? Wrong question: why did we think it a good idea to ignore success here? (a) it is safe to do so, and I can legitimately setup userspace to use this (b) reporting an error after we have committed the change is broken anyway. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx