On Thu, 2013-11-28 at 14:34 +0100, Daniel Vetter wrote: > On Thu, Nov 28, 2013 at 12:52:07PM +0200, Imre Deak wrote: > > On Thu, 2013-11-28 at 11:05 +0100, Daniel Vetter wrote: > > > On Thu, Nov 28, 2013 at 10:04:42AM +0200, Imre Deak wrote: > > > > So far we had a race during error capturing where a power domain could > > > > get disabled after we queried its state to be on. Fix this by protecting > > > > the power domain state tracking changes with a lock and holding this > > > > lock during error capturing. > > > > > > > > Note that this still allows the case where we are halfway in enabling a > > > > power domain and still report it as disabled. This isn't a problem as we > > > > only want to avoid register accesses while the domain is off and that is > > > > now guaranteed. > > > > > > > > Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> > > > > > > > > [ Applies on top of "drm/i915: add intel_display_power_enabled_sw() for > > > > use in atomic ctx" ] > > > > > > Tbh I don't see the point for this - the error capture code is inherently > > > racy, we don't grab any of the other modeset locks. We also don't care and > > > furthermore we should try to grab as few locks as possible to increase the > > > chances that we can actually capture the error state successfully. Every > > > lock you add adds another depency and so more ways to end in tears and > > > deadlocks. > > > > > > So if the racy error capture is the only reason I'll reject this. > > > > Yes it is only the race this fixes, but I'd like to argue for its > > usefulness. It fixes in particular the following two things: > > > > 1. avoiding unclaimed register accesses, which was the original goal of > > > > commit 9d1cb9147dbe45f6e94dc796518ecf67cb64b359 > > Author: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > Date: Fri Nov 1 13:32:08 2013 -0200 > > > > drm/i915: avoid unclaimed registers when capturing the error state > > Hm, I've thought we won't hit any unclaimed register warnings if there's > no concurrent/racing other thread doing stuff to the power wells? We'll hit it if we read a register while its power well is off, but we think it's on based on SW tracking. That's possible for example if you get one of the ERROR_INTERRUPTS and call into the error capture code and this happens while someone is disabling the power well. > And as explained I don't care that much about racing hangcheck ... > > > 2. avoiding an inconsistent power on/off value in the error state wrt. > > to the captured registers. Without the lock we can have an error state > > showing the power was on, but register values captured with power off, > > with some unpredictable values. This consistency check was the whole > > point of adding the power on/off value to the error state. > > > > I understand that adding complexity in general is risky, but in this > > case the risk is minimal. The only place we take the lock is to adjust a > > counter and to read HW registers, without any further nested locks, so > > no chance for lock inversions. > > The thing is that we already have tons of other similar races, e.g. we > could capture hilariously incosistent modeset state where the pfit is > disabled but pipesrc already set to something that would need upscaling. > And it's not a problem. So I still think we can just ignore those races, > even more we /should/ do so. Ok, I wanted to reduce the uncertainty. But if we don't care about this and having occasional unclaimed reg access reports are ok then I'm ok to ignore this patch. --Imre
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx