Hi Matt, On Mon, Dec 16, 2024 at 01:27:51PM -0800, Matt Roper wrote: > On Thu, Dec 12, 2024 at 03:51:12PM +0100, Andi Shyti wrote: > > On Fri, Dec 06, 2024 at 10:38:24AM -0500, Rodrigo Vivi wrote: > > > On Thu, Dec 05, 2024 at 03:47:35PM +0000, Sebastian Brzezinka wrote: > > > > `wa_verify`sporadically detects lost workaround on application; this > > > > is unusual behavior since wa are applied at `intel_gt_init_hw` and > > > > verified right away by `intel_gt_verify_workarounds`, and `wa_verify` > > > > doesn't fail on initialization as one might suspect would happen. > > > > > > > > One approach that may be somewhat beneficial is to reapply workarounds > > > > in the event of failure, or even get rid of verify on application, > > > > since it's redundant to `intel_gt_verify_workarounds`. > > > > > > > > This patch aims to resolve: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > > > > > It should be: > > > > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12668 > > > > aapart from the formatting issues this was suggested by me. We > > have observed some sporadic vailures in applying the specific > > workaround added by Ville (now cc'ed to the thread) in commit > > 0ddae025ab6c ("drm/i915: Disable compression tricks on JSL"). > > > > Because it's sporadic, we could give it one more chance and try > > to re-apply it. > > That sounds like it's just papering over the issue without really > figuring out what's truly going on. Independently from your next comments and the implementation, the sporadic errors we've seen have an extremely low rate and we weren't able to make sense to them or fully test them. We had a short chat with Ville (who implemented the workaround) amd he suggested to leave it as it is, while I suggested to give it another chance. > Looking at the current implementation, it looks like at least one > possible problem is that it was implemented in rcs_engine_wa_init, but > the CACHE_MODE_0 register itself is part of the LRC (according to bspec > 18907). So we want to move it to icl_ctx_workarounds_init() instead to > make sure it gets recorded in the golden context image. Our > initialization and reset handling for workarounds touching registers in > the context are different from those that aren't. > BTW, I'm a bit surprised to see us needing to implement this workaround > in the kernel at all. CACHE_MODE_0 is a register that's under userspace > control (according to bspec 14181), so we usually expect the userspace > drivers to own implementing any workarounds dealing with the registers > they control. Indeed, it looks like Mesa's Iris driver already has an > implementation of this workaround in iris_state.c: > > if (devinfo->disable_ccs_repack) { > iris_emit_reg(batch, GENX(CACHE_MODE_0), reg) { > reg.DisableRepackingforCompression = true; > reg.DisableRepackingforCompressionMask = true; > } > } > > and that workaround was added back in mid-2019 so it should be in all > recent Mesa releases. Ville? Any comment here? Andi