On Mon, Jan 30, 2017 at 03:42:17PM +0100, Maarten Lankhorst wrote: > Op 30-01-17 om 09:17 schreef Daniel Vetter: > > On Fri, Jan 27, 2017 at 03:08:45PM +0000, Chris Wilson wrote: > >> On Fri, Jan 27, 2017 at 03:58:08PM +0100, Daniel Vetter wrote: > >>> On Fri, Jan 27, 2017 at 02:31:55PM +0000, Chris Wilson wrote: > >>>> On Fri, Jan 27, 2017 at 03:21:29PM +0100, Daniel Vetter wrote: > >>>>> On Fri, Jan 27, 2017 at 09:30:50AM +0000, Chris Wilson wrote: > >>>>>> On Thu, Jan 26, 2017 at 04:59:21PM +0100, Maarten Lankhorst wrote: > >>>>>>> When writing some testcases for nonblocking modesets. I found out that the > >>>>>>> infinite wait on the old fb was causing issues. > >>>>>> The crux of the issue here is the locked wait for old dependencies and > >>>>>> the inability to inject the intel_prepare_reset disabling of all planes. > >>>>>> There are a couple of locked waits on struct_mutex within the modeset > >>>>>> locks for intel_overlay and if we happen to be using the display plane > >>>>>> for the first time. > >>>>>> > >>>>>> The first I suggested solving using fences to track dependencies and > >>>>>> keep the order between atomic states. Cancelling the outstanding > >>>>>> modesets, replacing with a disable and then on restore jumping to the > >>>>>> final state look doable. It also requires avoiding the struct_mutex for > >>>>>> disabling, which is quite easy. To avoid the wait under struct_mutex, > >>>>>> we've talked about switching to mmio, but for starters we could move the > >>>>>> wait from inside intel_overlay into the fence for the atomic operation. > >>>>>> (But's that a little more surgery than we would like for intel_overlay I > >>>>>> guess - dig out Ville's patches for overlay planes?) And to prevent the > >>>>>> wait under struct_mutex for pin_to_display_plane, my plane is to move > >>>>>> that to an async fenced operation that is then naturally waited upon by > >>>>>> the atomic modeset. > >>>>> A bit more a hack, but a different idea, and I think hack for gen234.0 is > >>>>> ok: > >>>>> > >>>>> We complete all the requests before we start the hw reset with fence.error > >>>>> = -EIO. But we do this only when we need to get at the display locks. A > >>>>> slightly more elegant solution would be to trylock modeset locks, and if > >>>>> one of them fails (and only then) complete all requests with -EIO to get > >>>>> the concurrent modeset to proceed before we reset the hardware. That's > >>>>> essentially the logic we had before all the reworks, and it worked. But I > >>>>> didn't look at how scary that all would be to make it work again ... > >>>> The modeset lock may not just be waiting on our requests (even on pnv we > >>>> can expect that there are already users celebrating that pnv+nouveau > >>>> finally works ;) and that the display is not the only user/observer of > >>>> those requests. Using the requests to break the modeset lock just feels > >>>> like the wrong approach. > >>> It's a cycle, and we need to break it somewhere. Another option might be > >>> to break the cycle the same way we do it for gem locks: Wake up everyone > >>> and restart the modeset ioctl. Since the trouble only happens for > >>> synchronous modesets where we hold the locks while waiting for fences, we > >>> can also break out of that and restart. And I also don't think that would > >>> leak to other drivers, after all our gem locking restart dances also don't > >>> leak to other drivers - it's just our own driver's lock which are affected > >>> by these special wakupe semantics. > >> It's a queue of nonblocking modesets that we need to worry about, afaik. > >> Moving the wait for blocking modeset outside of modeset lock is easily > >> achievable (and avoiding the other waits under both the modeset + > >> struct_mutex I have at least an idea for). So the challenge is how to > >> inject all-planes-off for gen3 and then allow the queue to continue again > >> afterwards. > > Hm right, I missed the nonblocking updates which don't take locks. But > > assuming we do the display reset for gpu resets as a full modeset (i.e. > > going through ->atomic_commit) it should still work out correctly: > > > > Starting state: gpu is hung, nonblocking modeset waiting for some requests > > to complete. > Missing one evil detail here, else things would have moved forward.. > > A unrelated thread performs a blocking commit, and holds all locks until the nonblocking modeset completes. And where is the problem in that? If we first set all fences to -EIO, and then try to grab locks, that other thread will be able to complete. After all this scheme worked before we reworked the reset logic completely. -Daniel > > 1. hangcheck kicks in, fires off reset work. > > > > 2. We complete all requests with fence.error = -EIO and wake up any > > waiters. That means no re-queueing for older platforms, but oh well. > > > > 3. We grab all the display locks. Nothing happens yet. > > > > 4. We reset the chip, display dies. > > > > 5. We run ->atomic_commit to restore things. This will also force the > > nonblocking commit worker to complete before this display restore touches > > anything. > > > > The only trouble I see is that the nonblocking worker can still touch the > > display block while we kill it, which isn't awesome. But we can fix that > > by waiting for all pending nonblocking commits in step 3 manually (without > > calling into atomic_commit), as long as we do step 2. > > > > So completing everything with EIO unconditionally still seems like the > > simplest option that actually works for pre-g4x ... > > -Daniel > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx