On Thu, Feb 09, 2017 at 04:39:29PM +0200, Jani Nikula wrote: > On Wed, 04 Jan 2017, Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Wed, Dec 21, 2016 at 12:08:45PM +0100, Maarten Lankhorst wrote: > >> Op 21-12-16 om 11:36 schreef Chris Wilson: > >> > On Wed, Dec 21, 2016 at 11:23:30AM +0100, Daniel Vetter wrote: > >> >> When writing the generic nonblocking commit code I assumed that > >> >> through clever lifetime management I can assure that the completion > >> >> (stored in drm_crtc_commit) only gets freed after it is completed. And > >> >> that worked. > >> >> > >> >> I also wanted to make nonblocking helpers resilient against driver > >> >> bugs, by having timeouts everywhere. And that worked too. > >> >> > >> >> Unfortunately taking boths things together results in oopses :( Well, > >> >> at least sometimes: What seems to happen is that the drm event hangs > >> >> around forever stuck in limbo land. The nonblocking helpers eventually > >> >> time out, move on and release it. Now the bug I tested all this > >> >> against is drivers that just entirely fail to deliver the vblank > >> >> events like they should, and in those cases the event is simply > >> >> leaked. But what seems to happen, at least sometimes, on i915 is that > >> >> the event is set up correctly, but somohow the vblank fails to fire in > >> >> time. Which means the event isn't leaked, it's still there waiting for > >> >> evevntually a vblank to fire. That tends to happen when re-enabling the > >> >> pipe, and then the trap springs and the kernel oopses. > >> >> > >> >> The correct fix here is simply to refcount the crtc commit to make > >> >> sure that the event sticks around even for drivers which only > >> >> sometimes fail to deliver vblanks for some arbitrary reasons. Since > >> >> crtc commits are already refcounted that's easy to do. > >> > Or make the event a part of the atomic state? > >> > -Chris > >> > > >> afaict crtc commit is already taken to wait for completion, so this patch makes sense. > >> > >> There's just a minor typo in the subject. :) > >> Not sure that release_commit should be a function pointer, regardless.. > >> > >> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > > > > It didn't help the bug reporters against oopses (but the reporters are > > supremely confusing, I have no idea what's really being tested, the > > bugzilla is a mess), but I still think the patch is useful for more > > robuestness, I dropped the cc: stable and applied it to drm-misc. > > Agreed on the bug [1] being a mess. However, the bug has a reliable > bisect result, the revert was posted by some of the reporters on the > lists and in the bug, and now something that will not help anyone in > v4.9 or v4.10 was pushed. :( Latest report just says that the revert isn't helping either. I suspect the report is a giantic conflagration of everything ever that kills various reporters boxes. I still believe that the patch here fixes the original bug, but there might be a lot more hiding. It's at least seen quite a pile of testing, so I think it's sounds, and we could cherry-pick it to dinf with cc: stable for 4.9+. Worst case it's not going to help for the other problems. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx