On Wed, Jul 13, 2016 at 04:57:19PM +0200, Daniel Vetter wrote: > On Wed, Jul 13, 2016 at 02:40:50PM +0200, Peter Wu wrote: > > On Wed, Jul 13, 2016 at 11:54:49AM +0200, Daniel Vetter wrote: > > > On Tue, Jul 12, 2016 at 06:49:34PM +0200, Peter Wu wrote: > > > > The FBIOPUT_CON2FBMAP ioctl takes a console_lock(). When this is called > > > > while nouveau was runtime suspended, a deadlock would occur due to > > > > nouveau_fbcon_set_suspend also trying to obtain console_lock(). > > > > > > > > Fix this by delaying the drm_fb_helper_set_suspend call. Based on the > > > > i915 code (which was done for performance reasons though). > > > > > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > Signed-off-by: Peter Wu <peter@xxxxxxxxxxxxx> > > > > --- > > > > Tested on top of v4.7-rc5, the deadlock is gone. > > > > > > If we bother with this, it should imo be moved into the drm_fb_helper.c > > > function drm_fb_helper_set_suspend(). But this also smells like some kind > > > of bad duct-tape. I think Lukas is working on some other rpm vs. fbdev > > > deadlocks, maybe we could fix them all with one proper fix? I've made some > > > comments on Lukas' last patch series. > > > > This patch is only needed for drivers that use console_lock (for > > drm_fb_helper_set_suspend) in their runtime resume functions. > > Lukas posted fixes for runtime PM reference leaks, those are different > > from this deadlock (see > > https://lists.freedesktop.org/archives/dri-devel/2016-July/113005.html > > for a backtrace for this issue). > > > > The deadlock could also be avoided if the device backing the fbcon is > > somehow runtime-resumed outside the lock, but that feels like a larger > > hack that does not seem easy. > > > > The i915 patch was done to reduce resume time (due to console_lock > > contention), that feature seems useful to all other drivers too even if > > the deadlock is fixed in a different way. > > I might have imagined something, but I thought Lukas is already working on > some rpm vs. vga_switcheroo inversions. But looking again this was on the > audio side. > > I think the proper solution for fbcon would be for the fbdev emulation to > also hold a runtime pm references when the console is in use. nouveau does this by adding a fb_open and fb_release function that calls pm_runtime_get and pm_runtime_put respectively (and is the only driver doing this). So that is why it causes the console_lock issue for nouveau, but not for others. > This should already happen correctly for vblank, the more tricky part > is fbdev mmap and fbcon: > > - I have no idea, but there should be a way to intercept fbdev userspace > mmaps and make sure that as long as an app has the fbdev mmapped we > don't runtime suspend. No one really should be doing that (at least for > normal setups), hence this shouldn't result in unsafe. mmap normally needs a fd right? When the chardev /dev/fbX is opened, the fb_open function will be called. So nouveau should not have this issue with mmap/read/write to a fb while the device is suspended. (This RPM thing was added in f231976c2e89 ("drm/nouveau/fbcon: take runpm reference when userspace has an open fd"), maybe it is not a bad idea for other drivers with RPM support either.) > - For fbcon, we could suspend it in the dpms off callbacks (maybe with a > timer), and resume it only when enabling fbcon again - fbcon needs to > redraw anyway on dpms on. Would this guarantee that the fb is suspended before/during suspend (of the graphics device) and resumed somewhere during/after resume? Suspending the fb also has the effect that reads/writes to /dev/fbN fail, maybe that is a bit too restricted since the framebuffer is just accessible until the device is suspended. (Hmm, fb_read/fb_write does not seem to do any locking...) > Another solution for fbcon might be to untangle the suspend/resume stuff > and protect it by something else than console_lock. But that means > fixing up fbcon locking horror shows. console_lock seems needed for some code down the call stack, removing it risks some blow ups. Some archaeology: this locking problem was introduced with 054430e773c9 ("fbcon: fix locking harder"). In the past fb_set_suspend also took the fb_info lock but that was removed in 9e769ff3f585 ("fb: avoid possible deadlock caused by fb_set_suspend"). Peter > > My current plan is to move stuff out of the lock and allow (just) > > resuming the console to be delayed. Some drivers (nouveau, > > radeon/amdgpu, i915) do unnecessary stuff under the console lock: > > > > - nouveau: I *think* that cleraing/setting FBINFO_HWACCEL_DISABLED > > (nouveau_fbcon_accel_restore) is safe outside the lock as the fb is > > already suspended before clearing/after setting the flag. > > - radeon: since the console is suspended, I don't think that that all > > of the code is radeon_resume_kms is really needed. > > - amdgpu: same as radeon. Btw, console_lock is leaked on an error path. > > - i915: I think that clearing the fb memory can be done outside the > > lock too as the console is suspended. > > > > Please correct me if my assumptions are flawed. > > Yeah, fixing this independent issues should definitely help, irrespective > of how we fix fb_set_suspend. > > > > Besides this, when fixing a deadlock pls provide more details about the > > > precise callchain and the locks involved in the deadlock. If you > > > discovered this using lockdep, then just add the entire lockdep splat to > > > the commit message. Otherwise there's lots of guesswork involved here. > > > -Daniel > > > > There was no lockdep splat, it was triggered via the ioctl in the commit > > message. I'll include the verbose trace from the previous mail in the > > next proposed patch to reduce hunting though. > > Sounds good too. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel