Re: [PATCH] drm/nouveau/fbcon: fix deadlock with FBIOPUT_CON2FBMAP

Peter Wu <peter@xxxxxxxxxxxxx> · Fri, 15 Jul 2016 13:10:51 +0200

On Wed, Jul 13, 2016 at 04:57:19PM +0200, Daniel Vetter wrote:
> On Wed, Jul 13, 2016 at 02:40:50PM +0200, Peter Wu wrote:
> > On Wed, Jul 13, 2016 at 11:54:49AM +0200, Daniel Vetter wrote:
> > > On Tue, Jul 12, 2016 at 06:49:34PM +0200, Peter Wu wrote:
> > > > The FBIOPUT_CON2FBMAP ioctl takes a console_lock(). When this is called
> > > > while nouveau was runtime suspended, a deadlock would occur due to
> > > > nouveau_fbcon_set_suspend also trying to obtain console_lock().
> > > > 
> > > > Fix this by delaying the drm_fb_helper_set_suspend call. Based on the
> > > > i915 code (which was done for performance reasons though).
> > > > 
> > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > > > Signed-off-by: Peter Wu <peter@xxxxxxxxxxxxx>
> > > > ---
> > > > Tested on top of v4.7-rc5, the deadlock is gone.
> > > 
> > > If we bother with this, it should imo be moved into the drm_fb_helper.c
> > > function drm_fb_helper_set_suspend(). But this also smells like some kind
> > > of bad duct-tape. I think Lukas is working on some other rpm vs. fbdev
> > > deadlocks, maybe we could fix them all with one proper fix? I've made some
> > > comments on Lukas' last patch series.
> > 
> > This patch is only needed for drivers that use console_lock (for
> > drm_fb_helper_set_suspend) in their runtime resume functions.
> > Lukas posted fixes for runtime PM reference leaks, those are different
> > from this deadlock (see
> > https://lists.freedesktop.org/archives/dri-devel/2016-July/113005.html
> > for a backtrace for this issue).
> > 
> > The deadlock could also be avoided if the device backing the fbcon is
> > somehow runtime-resumed outside the lock, but that feels like a larger
> > hack that does not seem easy.
> > 
> > The i915 patch was done to reduce resume time (due to console_lock
> > contention), that feature seems useful to all other drivers too even if
> > the deadlock is fixed in a different way.
> 
> I might have imagined something, but I thought Lukas is already working on
> some rpm vs. vga_switcheroo inversions. But looking again this was on the
> audio side.
> 
> I think the proper solution for fbcon would be for the fbdev emulation to
> also hold a runtime pm references when the console is in use.

nouveau does this by adding a fb_open and fb_release function that calls
pm_runtime_get and pm_runtime_put respectively (and is the only driver
doing this). So that is why it causes the console_lock issue for
nouveau, but not for others.

> This should already happen correctly for vblank, the more tricky part
> is fbdev mmap and fbcon:
> 
> - I have no idea, but there should be a way to intercept fbdev userspace
>   mmaps and make sure that as long as an app has the fbdev mmapped we
>   don't runtime suspend. No one really should be doing that (at least for
>   normal setups), hence this shouldn't result in unsafe.

mmap normally needs a fd right? When the chardev /dev/fbX is opened, the
fb_open function will be called. So nouveau should not have this issue
with mmap/read/write to a fb while the device is suspended.

(This RPM thing was added in f231976c2e89 ("drm/nouveau/fbcon: take
runpm reference when userspace has an open fd"), maybe it is not a bad
idea for other drivers with RPM support either.)

> - For fbcon, we could suspend it in the dpms off callbacks (maybe with a
>   timer), and resume it only when enabling fbcon again - fbcon needs to
>   redraw anyway on dpms on.

Would this guarantee that the fb is suspended before/during suspend (of
the graphics device) and resumed somewhere during/after resume?

Suspending the fb also has the effect that reads/writes to /dev/fbN
fail, maybe that is a bit too restricted since the framebuffer is just
accessible until the device is suspended.

(Hmm, fb_read/fb_write does not seem to do any locking...)

>   Another solution for fbcon might be to untangle the suspend/resume stuff
>   and protect it by something else than console_lock. But that means
>   fixing up fbcon locking horror shows.

console_lock seems needed for some code down the call stack, removing it
risks some blow ups.

Some archaeology: this locking problem was introduced with 054430e773c9
("fbcon: fix locking harder"). In the past fb_set_suspend also took the
fb_info lock but that was removed in 9e769ff3f585 ("fb: avoid possible
deadlock caused by fb_set_suspend").

Peter

> > My current plan is to move stuff out of the lock and allow (just)
> > resuming the console to be delayed.  Some drivers (nouveau,
> > radeon/amdgpu, i915) do unnecessary stuff under the console lock:
> > 
> >  - nouveau: I *think* that cleraing/setting FBINFO_HWACCEL_DISABLED
> >    (nouveau_fbcon_accel_restore) is safe outside the lock as the fb is
> >    already suspended before clearing/after setting the flag.
> >  - radeon: since the console is suspended, I don't think that that all
> >    of the code is radeon_resume_kms is really needed.
> >  - amdgpu: same as radeon. Btw, console_lock is leaked on an error path.
> >  - i915: I think that clearing the fb memory can be done outside the
> >    lock too as the console is suspended.
> > 
> > Please correct me if my assumptions are flawed.
> 
> Yeah, fixing this independent issues should definitely help, irrespective
> of how we fix fb_set_suspend.
> 
> > > Besides this, when fixing a deadlock pls provide more details about the
> > > precise callchain and the locks involved in the deadlock. If you
> > > discovered this using lockdep, then just add the entire lockdep splat to
> > > the commit message. Otherwise there's lots of guesswork involved here.
> > > -Daniel
> > 
> > There was no lockdep splat, it was triggered via the ioctl in the commit
> > message. I'll include the verbose trace from the previous mail in the
> > next proposed patch to reduce hunting though.
> 
> Sounds good too.
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel