On Thu, Oct 13, 2016 at 04:57:39PM +0200, Daniel Vetter wrote: > On Wed, Oct 12, 2016 at 10:05:19AM +0100, Chris Wilson wrote: > > The error state is purposefully racy as we expect it to be called at any > > time and so have avoided any locking whilst capturing the crash dump. > > However, with multi-engine GPUs and multiple CPUs, those races can > > manifest into OOPSes as we attempt to chase dangling pointers freed on > > other CPUs. Under discussion are lots of ways to slow down normal > > operation in order to protect the post-mortem error capture, but what it > > we take the opposite approach and freeze the machine whilst the error > > capture runs (note the GPU may still running, but as long as we don't > > process any of the results the driver's bookkeeping will be static). > > > > Note that by of itself, this is not a complete fix. It also depends on > > the compiler barriers in list_add/list_del to prevent traversing the > > lists into the void. We also depend that we only require state from > > carefully controlled sources - i.e. all the state we require for > > post-mortem debugging should be reachable from the request itself so > > that we only have to worry about retrieving the request carefully. Once > > we have the request, we know that all pointers from it are intact. > > > > v2: Avoid drm_clflush_pages() inside stop_machine() as it may use > > stop_machine() itself for its wbinvd fallback. > > Hm, won't this hurt us real bad on any atom with ppgtt? Maybe a big check > gen check with a scary comment about why we can't call drm_clflush_pages > on old machines? Iirc gen3+ should all be able to flush without > stop_machine. :) Patch 2 switched to using coherent reads through the GTT for all. Everyone is now equal (and the nice part about that was that it uncovered the WC bug from kernel 4.0!) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx