Hey, Op 31-01-17 om 08:46 schreef Daniel Vetter: > On Mon, Jan 30, 2017 at 03:42:17PM +0100, Maarten Lankhorst wrote: >> Op 30-01-17 om 09:17 schreef Daniel Vetter: >>> On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote: >>>> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote: >>>>> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote: >>>>>> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote: >>>>>>> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote: >>>>>>>> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote: >>>>>>>>> When writing some testcases for nonblocking modesets. I found out that the >>>>>>>>> infinite wait on the old fb was causing issues. >>>>>>>> The crux of the issue here is the locked wait for old dependencies and >>>>>>>> the inability to inject the intel_prepare_reset disabling of all planes. >>>>>>>> There are a couple of locked waits on struct_mutex within the modeset >>>>>>>> locks for intel_overlay and if we happen to be using the display plane >>>>>>>> for the first time. >>>>>>>> >>>>>>>> The first I suggested solving using fences to track dependencies and >>>>>>>> keep the order between atomic states. Cancelling the outstanding >>>>>>>> modesets, replacing with a disable and then on restore jumping to the >>>>>>>> final state look doable. It also requires avoiding the struct_mutex for >>>>>>>> disabling, which is quite easy. To avoid the wait under struct_mutex, >>>>>>>> we've talked about switching to mmio, but for starters we could move the >>>>>>>> wait from inside intel_overlay into the fence for the atomic operation. >>>>>>>> (But's that a little more surgery than we would like for intel_overlay I >>>>>>>> guess - dig out Ville's patches for overlay planes?) And to prevent the >>>>>>>> wait under struct_mutex for pin_to_display_plane, my plane is to move >>>>>>>> that to an async fenced operation that is then naturally waited upon by >>>>>>>> the atomic modeset. >>>>>>> A bit more a hack, but a different idea, and I think hack for gen234.0 is >>>>>>> ok: >>>>>>> >>>>>>> We complete all the requests before we start the hw reset with fence.error >>>>>>> = -EIO. But we do this only when we need to get at the display locks. A >>>>>>> slightly more elegant solution would be to trylock modeset locks, and if >>>>>>> one of them fails (and only then) complete all requests with -EIO to get >>>>>>> the concurrent modeset to proceed before we reset the hardware. That's >>>>>>> essentially the logic we had before all the reworks, and it worked. But I >>>>>>> didn't look at how scary that all would be to make it work again ... >>>>>> The modeset lock may not just be waiting on our requests (even on pnv we >>>>>> can expect that there are already users celebrating that pnv+nouveau >>>>>> finally works ;) and that the display is not the only user/observer of >>>>>> those requests. Using the requests to break the modeset lock just feels >>>>>> like the wrong approach. >>>>> It's a cycle, and we need to break it somewhere. Another option might be >>>>> to break the cycle the same way we do it for gem locks: Wake up everyone >>>>> and restart the modeset ioctl. Since the trouble only happens for >>>>> synchronous modesets where we hold the locks while waiting for fences, we >>>>> can also break out of that and restart. And I also don't think that would >>>>> leak to other drivers, after all our gem locking restart dances also don't >>>>> leak to other drivers - it's just our own driver's lock which are affected >>>>> by these special wakupe semantics. >>>> It's a queue of nonblocking modesets that we need to worry about, afaik. >>>> Moving the wait for blocking modeset outside of modeset lock is easily >>>> achievable (and avoiding the other waits under both the modeset + >>>> struct_mutex I have at least an idea for). So the challenge is how to >>>> inject all-planes-off for gen3 and then allow the queue to continue again >>>> afterwards. >>> Hm right, I missed the nonblocking updates which don't take locks. But >>> assuming we do the display reset for gpu resets as a full modeset (i.e. >>> going through ->atomic_commit) it should still work out correctly: >>> >>> Starting state: gpu is hung, nonblocking modeset waiting for some requests >>> to complete. >> Missing one evil detail here, else things would have moved forward.. >> >> A unrelated thread performs a blocking commit, and holds all locks until the nonblocking modeset completes. > And where is the problem in that? If we first set all fences to -EIO, and > then try to grab locks, that other thread will be able to complete. After > all this scheme worked before we reworked the reset logic completely. True, but we probably still want to cap the timeout (patch 2) to prevent a deadlock when a fence on another driver misbehaves. And if we have a timeout, then things will move forward eventually even if we wait for locks, though it might still be a good idea to complete everything with -EIO first to make it happen faster. :) ~Maarten _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel