On Tue, Oct 13, 2015 at 02:09:48PM +0100, Chris Wilson wrote: > On Tue, Oct 13, 2015 at 02:23:57PM +0200, Daniel Vetter wrote: > > On Tue, Oct 13, 2015 at 12:44:05PM +0100, Chris Wilson wrote: > > > On Tue, Oct 13, 2015 at 01:26:36PM +0200, Daniel Vetter wrote: > > > > On Mon, Oct 12, 2015 at 10:31:35AM +0100, Chris Wilson wrote: > > > > > On Mon, Oct 12, 2015 at 10:06:23AM +0100, Tvrtko Ursulin wrote: > > > > > > > > > > > > On 09/10/15 18:26, Chris Wilson wrote: > > > > > > >On Fri, Oct 09, 2015 at 07:14:02PM +0200, Daniel Vetter wrote: > > > > > > >>On Fri, Oct 09, 2015 at 10:03:14AM +0100, Tvrtko Ursulin wrote: > > > > > > >>> > > > > > > >>>On 09/10/15 09:55, Daniel Vetter wrote: > > > > > > >>>>On Fri, Oct 09, 2015 at 09:40:53AM +0100, Chris Wilson wrote: > > > > > > >>>>>On Fri, Oct 09, 2015 at 09:48:01AM +0200, Daniel Vetter wrote: > > > > > > >>>>>>On Thu, Oct 08, 2015 at 10:45:47AM +0100, Tvrtko Ursulin wrote: > > > > > > >>>>>>The concern is that this isn't how SIG_SEGV works, it's a signal the > > > > > > >>>>>>thread who made the invalid access gets directly. You never get a SIG_SEGV > > > > > > >>>>>>for bad access someone else has made. So essentially it's new ABI. > > > > > > >>>>> > > > > > > >>>>>SIGBUS. For which the answer is yes, you can and do get SIGBUS for > > > > > > >>>>>actions taken by other processes. > > > > > > >>>> > > > > > > >>>>Oh right I always forget that SIGBUS aliases with SIGIO. Anyway if > > > > > > >>>>userspace wants SIGIO we just need to provide it with a pollable fd and > > > > > > >>>>then it can use fcntl to make that happen. That's imo a much better api > > > > > > >>>>than unconditionally throwing around signals. Also we already have the > > > > > > >>>>reset stats ioctl to tell userspace that its gpu context is toats. If > > > > > > >>>>anyone wants that to be pollable (or even send SIGIO) I think we should > > > > > > >>>>extend that, with all the usual "needs userspace&igt" stuff on top. > > > > > > >>> > > > > > > >>>I don't see that this notification can be optional. Process is confused > > > > > > >>>about its memory map use so should die. :) > > > > > > >>> > > > > > > >>>This is not a GPU error/hang - this is the process doing stupid things. > > > > > > >>> > > > > > > >>>MMU notifiers do not support decision making otherwise we could say > > > > > > >>>-ETXTBUSY or something on munmap, but we can't. Not even sure that it would > > > > > > >>>help in all cases, would have to fail clone as well and who knows what. > > > > > > >> > > > > > > >>So what happens if the gpu just keeps using the memory? It'll all be > > > > > > >>horribly undefined behaviour and eventually it'll die on an -EFAULT in > > > > > > >>execbuf, but does anything else bad happen? > > > > > > > > > > > > > >We don't see an EFAULT unless a miracle occurs, and the stale pages > > > > > > >continue to be read/written by other processes (as well as the client). > > > > > > >Horribly undefined behaviour with a misinformation leak. > > > > > > > > > > > > What other processes? Pages will still be referenced so won't be > > > > > > reused so there is not information leak across unrelated processes. > > > > > > Unless you meant ones involved in object sharing? > > > > > > > > > > This client is trying to replace the userptr with a fresh set of pages. > > > > > The GPU and other processes continue to see the old pages i.e. old > > > > > information (misinformation :) leaks. > > > > > > > > userptr explicitly doesn't support this. You need to tear down the > > > > existing userptr object and then create a new one if you change the > > > > mmap'ing. So that's really just a bug in userspace, and the question is > > > > how do we tell userspace best that it's done something stupid. > > > > > > Pardon? Note this also affects munmap if you don't accept mremap (which > > > is not explicitly unsupported as it fits quite nicely within the > > > existing rules). > > > > > > > My stance that the best one is to either kill any ctx using that object or > > > > at least indicate there's trouble using reset stats. But sending a > > > > SIGBUS/SIG_SEGV (which can only happen to the thread that does a memory > > > > access, not any other thread that's accidentally in the same process > > > > group) is imo abuse. > > > > > > The signal is sent to everything that inherited the mm, not bound to the > > > single thread. > > > > > > > Or we just need to make sure we do get the EFAULT on > > > > the next execbuffer. > > > > > > > > Or maybe it just doesn't matter, i.e. where is the userspace which a) does > > > > silly stuff like this b) wants proper notification? Adding ABI just > > > > because we can't isn't going to get merged. > > > > > > No client wants to be killed just because it does something stupid, it > > > is killed to protect the integrity of the system. > > > > But killing the client won't get rid of the ctx/objects, so won't really > > solve all that much? Especially it won't get rid of the framebuffers, > > which is the real trouble here it seems. That's because logind keeps a > > duped copy of the fd for it's own purposes. > > I mentioned that in the commit message - but do note that what is pinned > is what the client has mmapped at that time. Before it is changed, the > client is killed and so we do not get into the situation where the > output is invalid (just old). > > > So the better fix would be to make sure we don't accidentally pin a > > userptr object, i.e. adding a check in intel_pin_and_fence_fb for that > > (plus igt testcase). I somehow missed in all this discussion that this is > > about pinned objects ;-) > > Preventing wrapping a userptr in a fb is one possibility. > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index b89131654a0e..0cc689d0ce0a 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -14116,6 +14116,9 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb, > struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); > struct drm_i915_gem_object *obj = intel_fb->obj; > > + if (obj->userptr.mm) > + return -EINVAL; > + > return drm_gem_handle_create(file, &obj->base, handle); > } > > And we can return to this discussion the next time someone mentions > hitting the WARN. Or they complain about not being able to create fb. Yeah I think I much prefer this stop-gap measure until we have a real need. sob + igt + discussion summary and I'll apply this. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx