On Fri, Jun 30, 2017 at 03:30:33PM +0200, Daniel Vetter wrote: > On Fri, Jun 30, 2017 at 03:25:58PM +0200, Daniel Vetter wrote: > > On Thu, Jun 29, 2017 at 04:49:48PM +0300, ville.syrjala@xxxxxxxxxxxxxxx wrote: > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > Introduce an rw_semaphore to protect the display commits. All normal > > > commits use down_read() and hence can proceed in parallel, but GPU reset > > > will use down_write() making sure no other commits are in progress when > > > we have to pull the plug on the display engine on pre-g4x platforms. > > > There are no modeset/gem locks taken inside __intel_atomic_commit_tail() > > > itself, and we wait for all dependencies before the down_read(), and > > > thus there is no chance of deadlocks with this scheme. > > > > How does this solve the deadlock? Afaiui the deadlock is that the gpu > > reset stopped unconditionally completing all requests before it did > > anything else, including anything with the hw or taking modeset locks. > > > > This ensured that any outstanding flips (we only had pageflips, no atomic > > ones back then) could complete (although maybe displaying garbage). The > > only thing we had to do was grab the locks (to avoid concurrent modesets) > > and then we could happily nuke the hw (since the flips where lost in the > > CS anyway), and restore it afterwards. > > > > Then the TDR rewrite came around and broke that, which now makes atomic > > time out waiting for the gpu to complete (because the gpu reset waits for > > the modeset to complete first). Which means if you want to fix this > > without breaking TDR, then you need to cancel the pending atomic commits. > > That seems somewhat what you're attempting here with trying to figure out > > what the latest hw-committed step is (but that function gets it wrong), > > and lots more locking and stuff on top. > > > > Why exactly can't we just go back to the old world of force-completing > > dead requests on gen4 and earlier? That would be tons simpler imo instead > > of throwing a pile of hacks (which really needs a complete rewrite of the > > atomic code in i915) in as a work-around. We didn't have TDR on gen4 and > > earlier for years, going back to that isn't going to hurt anyone. > > > > Making working gen4 gpu reset contigent on cancellable atomic commits and > > the full commit queue is imo nuts. > > And if the GEM folks insist the old behavior can't be restored, then we > just need a tailor-made get-out-of-jail card for gen4 gpu reset somewhere > in i915_sw_fence. Force-completing all render requests atomic updates > depend upon is imo the simplest solution to this, and we've had a driver > that worked like that for years. And it used to break all the time. I think we've had to fix it at least three times by now. So I tend to think it's better to fix it in a way that won't break so easily. -- Ville Syrjälä Intel OTC