Hi Nitin, > > > we see an issue where resets fails because the engine resumes from an > > > incorrect RING_HEAD. Since the RING_HEAD doesn't point to the > > > remaining requests to re-run, but may instead point into the > > > uninitialised portion of the ring, the GPU may be then fed invalid > > > instructions from a privileged context, oft pushing the GPU into an > > > unrecoverable hang. > > > > > > If at first the write doesn't succeed, try, try again. > > > > > > v2: Avoid unnecessary timeout macro (Andi) > > > > > > v3: Correct comment format (Andi) > > > > > > v4: Make it generic for all platform as it won't impact (Chris) > > > > > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/5432 > > > Testcase: igt/i915_selftest/hangcheck > > > > The referenced HSW-specific gitlab issue was closed in 2022 and hadn't been > > active for a while before that. This patch from Chris was originally posted as an > > attachment on that gitlab issue asking if it helped, but nobody responded that it > > did/didn't improve the situation so it may or may not have been relevant to > > what was originally reported in that ticket. > > > > Looking in cibuglog, the most similar failures I see today are the ones getting > > associated with issue #12310. I.e., > > > > <3> [220.415493] i915 0000:00:02.0: [drm] *ERROR* failed to set rcs0 > > head to zero ctl 00000000 head 00001db8 tail 00000000 start 7fffa000 > > > > Are you trying to solve that CI issue or is there a different user-submitted report > > somewhere that this patch is trying to address? > > > > > > Matt > > > > Yes. This patch is for https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12310 > I will update the link. No worries, I can update the link here. Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> Thanks, Andi