Re: [PATCH] drm/i915: make sure power domains don't get disabled during error capture

Imre Deak <imre.deak@xxxxxxxxx> · Thu, 28 Nov 2013 15:53:37 +0200

On Thu, 2013-11-28 at 14:34 +0100, Daniel Vetter wrote:
> On Thu, Nov 28, 2013 at 12:52:07PM +0200, Imre Deak wrote:
> > On Thu, 2013-11-28 at 11:05 +0100, Daniel Vetter wrote:
> > > On Thu, Nov 28, 2013 at 10:04:42AM +0200, Imre Deak wrote:
> > > > So far we had a race during error capturing where a power domain could
> > > > get disabled after we queried its state to be on. Fix this by protecting
> > > > the power domain state tracking changes with a lock and holding this
> > > > lock during error capturing.
> > > > 
> > > > Note that this still allows the case where we are halfway in enabling a
> > > > power domain and still report it as disabled. This isn't a problem as we
> > > > only want to avoid register accesses while the domain is off and that is
> > > > now guaranteed.
> > > > 
> > > > Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
> > > > 
> > > > [ Applies on top of "drm/i915: add intel_display_power_enabled_sw() for
> > > > use in atomic ctx" ]
> > > 
> > > Tbh I don't see the point for this - the error capture code is inherently
> > > racy, we don't grab any of the other modeset locks. We also don't care and
> > > furthermore we should try to grab as few locks as possible to increase the
> > > chances that we can actually capture the error state successfully. Every
> > > lock you add adds another depency and so more ways to end in tears and
> > > deadlocks.
> > >
> > > So if the racy error capture is the only reason I'll reject this.
> > 
> > Yes it is only the race this fixes, but I'd like to argue for its
> > usefulness. It fixes in particular the following two things:
> > 
> > 1. avoiding unclaimed register accesses, which was the original goal of 
> >     
> > commit 9d1cb9147dbe45f6e94dc796518ecf67cb64b359
> > Author: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> > Date:   Fri Nov 1 13:32:08 2013 -0200
> > 
> > drm/i915: avoid unclaimed registers when capturing the error state
> 
> Hm, I've thought we won't hit any unclaimed register warnings if there's
> no concurrent/racing other thread doing stuff to the power wells?

We'll hit it if we read a register while its power well is off, but we
think it's on based on SW tracking. That's possible for example if you
get one of the ERROR_INTERRUPTS and call into the error capture code and
this happens while someone is disabling the power well.

> And as explained I don't care that much about racing hangcheck ...
>
> > 2. avoiding an inconsistent power on/off value in the error state wrt.
> > to the captured registers. Without the lock we can have an error state
> > showing the power was on, but register values captured with power off,
> > with some unpredictable values. This consistency check was the whole
> > point of adding the power on/off value to the error state.
> > 
> > I understand that adding complexity in general is risky, but in this
> > case the risk is minimal. The only place we take the lock is to adjust a
> > counter and to read HW registers, without any further nested locks, so
> > no chance for lock inversions.
> 
> The thing is that we already have tons of other similar races, e.g. we
> could capture hilariously incosistent modeset state where the pfit is
> disabled but pipesrc already set to something that would need upscaling.
> And it's not a problem. So I still think we can just ignore those races,
> even more we /should/ do so.

Ok, I wanted to reduce the uncertainty. But if we don't care about this
and having occasional unclaimed reg access reports are ok then I'm ok to
ignore this patch.

--Imre

Attachment:
signature.asc

Description: This is a digitally signed message part
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx