Quoting Daniel Vetter (2017-08-30 13:23:56) > On Tue, Aug 29, 2017 at 03:59:36PM +0100, Chris Wilson wrote: > > Quoting Joonas Lahtinen (2017-08-29 15:54:06) > > > On Tue, 2017-08-29 at 11:33 +0100, Chris Wilson wrote: > > > > Since we hold the device wakeref when writing through the GTT (otherwise > > > > the writes would fail), we presumed that before the device sleeps those > > > > writes would naturally be flushed and that we wouldn't need our mmio > > > > read trick. However, that presumption seems false and a sleepy bxt seems > > > > to require us to always manually flush the GTT writes prior to direct > > > > access. > > > > > > > > Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before flushing GTT writes") > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > > > > Got any Bugzilla, Testcase, Tested-by? > > > > Original bugzilla hasn't been reopened, so I its looks like they were > > happy enough with the original patches that fixed the problem on my bxt. > > The testcase seems to be very system dependent, my suspicion is that it > > has to do with the wacky runtime pm exhibited by CI bxt. > > CI bxt doesn't have displays, which means we shut down a lot more when > it's running. Does this indicate a huge gem test gap where we should run > plenty of gem testcases with all the outputs shut down? This one is hard to tell since we are guessing at how the hw actually works. Strong PCI ordering it is not. If we take this example at face value, the key point of failure was rpm_get_if_in_use, so we could simply say that we need to ensure that all such branches are exercised, with varying amounts of stress since we are looking for a random hw delay. At the moment that boils down to the shrinker being avoiding unbinding anything whilst the device is idle, pushing us closer to oom (with kswapd hopefully riding to the rescue, and we all know how unreliable kswapd is). The wacky part of CI suspend seems to be that there's no reason for the device to wake at times, quite a few of the tests are just burning cycles without touching the hw and we still have the constant stream of suspend/resume. I'm very suspicious that we are waking up too often (and that it takes too long, about 28ms including the hpd of a headless machine). > Or just the need to add a pile more tests to pm_rpm? No. It's just your regular combinatorial explosion. The approach I would take here would be to register a sysenter callback that attempted to do a rpm suspend (i.e. so ~every ioctl would start from idle, and controlled via the faultinjection framework) and then run the minimal test set that exercises all ioctl paths, and then expand to all driver branches. First we need coverage feedback. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx