On Tue, Oct 10, 2023 at 05:42:28PM +0100, Tvrtko Ursulin wrote: > > On 10/10/2023 17:17, Andi Shyti wrote: > > Hi Matt, > > > > > > > > FIXME: CAT errors are cropping up on MTL. This removes them, > > > > > > but the real root cause must still be diagnosed. > > > > > > > > > > Do you have a link to specific IGT test(s) that illustrate the CAT > > > > > errors so that we can ensure that they now appear fixed in CI? > > > > > > > > this one: > > > > > > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_124599v1/bat-mtlp-8/igt@i915_selftest@live@xxxxxxxxxxxxxx > > > > > > > > Andi > > > > > > Wait, now I'm confused. That's a failure caused by a different patch > > > series (one that we won't be moving forward with). The live@hugepages > > > test is always passing on drm-tip today: > > > https://intel-gfx-ci.01.org/tree/drm-tip/igt@i915_selftest@live@xxxxxxxxxxxxxx > > > > yes, true, but that patch allows us to move forward with the > > testing and hit the CAT error. > > > > (it was the most reachable link I found :)) > > > > > Is there a test that's giving CAT errors on drm-tip itself (even > > > sporadically) that we can monitor to see the impact of Jonathan's patch > > > here? > > > > Otherwise this one: > > > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13667/re-mtlp-3/igt@gem_exec_fence@xxxxxxxxxxxxx#dmesg-warnings11 > > Parachuting in on a tangent - please do not mix CAT and CT errors. CAT, for me at least, associates with CATastrophic faults reported over CT channel, like GuC page faulting IIRC. > > For CT errors maybe GuC folks can sched some light what they mean. 0x6000 is GUC_ACTION_GUC2HOST_NOTIFY_MEMORY_CAT_ERROR so this actually is a CAT error, delivered via the CT channel. Matt > > Regards, > > Tvrtko -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation