On Wed, Oct 16, 2013 at 10:06:27AM -0700, Ben Widawsky wrote: > On Wed, Oct 16, 2013 at 05:58:31PM +0100, Chris Wilson wrote: > > On Wed, Oct 16, 2013 at 09:21:30AM -0700, Ben Widawsky wrote: > > > Once the machine gets to a certain point in the suspend process, we > > > expect the GPU to be idle. If it is not, we might corrupt memory. > > > Empirically (with an early version of this patch) we have seen this is > > > not the case. We cannot currently explain why the latent GPU writes > > > occur. > > > > > > In the technical sense, this patch is a workaround in that we have an > > > issue we can't explain, and the patch indirectly solves the issue. > > > However, it's really better than a workaround because we understand why > > > it works, and it really should be a safe thing to do in all cases. > > > > > > The noticeable effect other than the debug messages would be an increase > > > in the suspend time. I have not measure how expensive it actually is. > > > > > > I think it would be good to spend further time to root cause why we're > > > seeing these latent writes, but it shouldn't preclude preventing the > > > fallout. > > > > > > NOTE: It should be safe (and makes some sense IMO) to also keep the > > > VALID bit unset on resume when we clear_range(). I've opted not to do > > > this as properly clearing those bits at some later point would be extra > > > work. > > > > > > v2: Fix bugzilla link > > > > And the other one? > > > > I'm really amazing. If we move ahead with this patch, Daniel, can you just erase > the extra bugs.freedesktop.org/6549:// > > > > Bugzilla: http://bugs.freedesktop.org/6549://bugs.freedesktop.org/show_bug.cgi?id=65496 > > Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496 Fixed and merged with cc: stable. -Daniel > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321 > > > Tested-by: Takashi Iwai <tiwai@xxxxxxx> > > > Tested-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > > Signed-off-by: Ben Widawsky <ben@xxxxxxxxxxxx> > > > > So clearing the valid bit should result in the GPU reporting errors for > > delayed accesses, but none were reported? > > -Chris > > > > So I can't actually reproduce the problem for some reason. Paulo will > need to answer. One theory is the fault information is lost on suspend. > > The original patch put faults both in suspend, and resume. After this, I > asked Paulo to wedge the GPU, and there I saw faults. > > -- > Ben Widawsky, Intel Open Source Technology Center > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx