On Tue, Oct 10, 2023 at 06:17:27PM +0200, Andi Shyti wrote: > Hi Matt, > > > > > > FIXME: CAT errors are cropping up on MTL. This removes them, > > > > > but the real root cause must still be diagnosed. > > > > > > > > Do you have a link to specific IGT test(s) that illustrate the CAT > > > > errors so that we can ensure that they now appear fixed in CI? > > > > > > this one: > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_124599v1/bat-mtlp-8/igt@i915_selftest@live@xxxxxxxxxxxxxx > > > > > > Andi > > > > Wait, now I'm confused. That's a failure caused by a different patch > > series (one that we won't be moving forward with). The live@hugepages > > test is always passing on drm-tip today: > > https://intel-gfx-ci.01.org/tree/drm-tip/igt@i915_selftest@live@xxxxxxxxxxxxxx > > yes, true, but that patch allows us to move forward with the > testing and hit the CAT error. > > (it was the most reachable link I found :)) > > > Is there a test that's giving CAT errors on drm-tip itself (even > > sporadically) that we can monitor to see the impact of Jonathan's patch > > here? > > Otherwise this one: > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13667/re-mtlp-3/igt@gem_exec_fence@xxxxxxxxxxxxx#dmesg-warnings11 Okay, looks like this is a pretty sporadic failure: https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_exec_fence@parallel@xxxxxxxxx so we'll need to monitor this for quite a while to make sure it's truly gone. Assuming you've done enough local test cycles to confirm that this definitely avoids the CAT errors, Acked-by: Matt Roper <matthew.d.roper@xxxxxxxxx> as a short-term mitigation while we debug further. We still need to continue searching for a proper fix and/or drive this through the hardware team and get them to document this as a new official workaround for some kind of cache coherency problem. BTW, it would also be good to have a patch that adds explicit handling for GuC action 0x6000 (GUC_ACTION_GUC2HOST_NOTIFY_MEMORY_CAT_ERROR) so that we'll at least have more meaningful error output if/when this is encountered in the future. Matt > > Andi -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation