On Thu, Oct 08, 2015 at 10:45:47AM +0100, Tvrtko Ursulin wrote: > > On 28/09/15 15:14, Daniel Vetter wrote: > >On Mon, Sep 28, 2015 at 02:52:30PM +0100, Chris Wilson wrote: > >>On Mon, Sep 28, 2015 at 03:42:22PM +0200, Daniel Vetter wrote: > >>>On Wed, Sep 23, 2015 at 09:07:24PM +0100, Chris Wilson wrote: > >>>>If the client revokes the virtual address it asked to be mapped into GPU > >>>>space via userptr (by using anything along the lines of mmap, mprotect, > >>>>madvise, munmap, ftruncate etc) the mmu notifier sends a range > >>>>invalidate command to userptr. Upon receiving the invalidation signal > >>>>for the revoked range, we try to release the struct pages we pinned into > >>>>the GTT. However, this can fail if any of the GPU's VMA are pinned for > >>>>use by the hardware (i.e. despite the user's intention we cannot > >>>>relinquish the client's address range and keep uptodate with whatever is > >>>>placed in there). Currently we emit a few WARN so that we would notice > >>>>if this every occurred in the wild; it has. Sadly this means we need to > >>>>replace those WARNs with the proper SIGBUS to the offending clients > >>>>instead. > >>>> > >>>>Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>>Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx> > >>>>--- > >>>> drivers/gpu/drm/i915/i915_gem_userptr.c | 41 +++++++++++++++++++++++++++++---- > >>>> 1 file changed, 37 insertions(+), 4 deletions(-) > >>>> > >>>>diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > >>>>index f75d90118888..efb404b9fe69 100644 > >>>>--- a/drivers/gpu/drm/i915/i915_gem_userptr.c > >>>>+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > >>>>@@ -81,11 +81,44 @@ static void __cancel_userptr__worker(struct work_struct *work) > >>>> was_interruptible = dev_priv->mm.interruptible; > >>>> dev_priv->mm.interruptible = false; > >>>> > >>>>- list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) { > >>>>- int ret = i915_vma_unbind(vma); > >>>>- WARN_ON(ret && ret != -EIO); > >>>>+ list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) > >>>>+ i915_vma_unbind(vma); > >>>>+ if (i915_gem_object_put_pages(obj)) { > >>>>+ struct task_struct *p; > >>>>+ > >>>>+ DRM_ERROR("Unable to revoke ownership by userptr of" > >>>>+ " invalidated address range, sending SIGBUS" > >>>>+ " to attached clients.\n"); > >>>>+ > >>>>+ rcu_read_lock(); > >>>>+ for_each_process(p) { > >>>>+ siginfo_t info; > >>>>+ > >>>>+ /* This doesn't capture everyone who has > >>>>+ * the pages pinned behind a VMA as we > >>>>+ * do not have that tracking information > >>>>+ * available, it does however kill the > >>>>+ * original process (and siblings) who > >>>>+ * created the userptr and presumably tried > >>>>+ * to reuse the address space despite having > >>>>+ * pinned it (possibly indirectly) to the hw. > >>>>+ * Arguably, we don't even want to kill the > >>>>+ * other processes as they are not at fault, > >>>>+ * likely to be a display server, and hopefully > >>>>+ * will release the pages in due course once > >>>>+ * the client is dead. > >>>>+ */ > >>>>+ if (p->mm != obj->userptr.mm->mm) > >>>>+ continue; > >>>>+ > >>>>+ info.si_signo = SIGBUS; > >>>>+ info.si_errno = 0; > >>>>+ info.si_code = BUS_ADRERR; > >>>>+ info.si_addr = (void __user *)obj->userptr.ptr; > >>>>+ force_sig_info(SIGBUS, &info, p); > >>>>+ } > >>>>+ rcu_read_unlock(); > >>> > >>>Why do we need to send a SIGBUS? It won't tear down the offending gem bo, > >>>any new users will hopefully get it, and abusing SIGBUS without the thread > >>>actually doing a memory access is a bit surprising. DRM_DEBUG seems to be > >>>the most we can do here I think - I think userspace being able to do this > >>>is just a fundamental property of userptr. > >> > >>It is not the bo that is at fault but the *client's* *address* *space* > >>that is being changed. It is equivalent to mmap on a truncated file i.e. > >>if the client tries to use its mmapping after it has truncated the file > >>it is scolded via SIGBUS. > > > >But existing SIGBUS is thread-bound and comes with the fault address > >attached. This is just the gpu being a bit unhappy, so the SIGBUS comes > >out of complete nowhere to smack the userspace thread. Any kind of SIGBUS > >catcher userspace has for other reasons might be supremely surprised by > >this and do stupid things. Hence I don't think throwing SIGBUS here is > >correct behaviour. And there doesn't seem to be anything else suitable > >really. > > Te offending address is provided with the signal as far as I can see. > > I think it is fine to do this, even required since the alternative is for > GPU to keep using random memory indefinitely and userspace never gets to > know. > > And I don't see any reason to keep the process running who did such an > elementary and serious mistake. > > Is the only concern that the process can catch it and not exit? The concern is that this isn't how SIG_SEGV works, it's a signal the thread who made the invalid access gets directly. You never get a SIG_SEGV for bad access someone else has made. So essentially it's new ABI. And the other bit is that doing new ABI with signals is considered bad taste - if you want to have something that gets to userspace directly then create an fd which is polleable. And then, if userspace chooses so, it can kill itself with a SIG_IO using fcntl. But the goal here seems to just to be telling userspace that something bad has happened with its gpu context, and we already have gem_reset_stats for that. That would imo be the correct interface to provide this information to userspace. After all after a gpu reset it's also all garbage and we just let everything continue merilly without killing the process that submitted the work. I don't see how a fault is any different. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx