On Fri, Jun 23, 2023 at 02:05:20PM -0700, Lucas De Marchi wrote: > On Fri, Jun 23, 2023 at 12:48:13PM -0700, Kenneth Graunke wrote: > > On Friday, June 23, 2023 8:49:05 AM PDT Lucas De Marchi wrote: > > > On Thu, Jun 22, 2023 at 04:37:21PM -0700, Kenneth Graunke wrote: > > > >On Thursday, June 22, 2023 11:27:30 AM PDT Lucas De Marchi wrote: > > > >> Most of the context workarounds tweak masked registers, but not all. For > > > >> masked registers, when writing the value it's sufficient to just write > > > >> the wa->set_bits since that will take care of both the clr and set bits > > > >> as well as not overwriting other bits. > > > >> > > > >> However there are some workarounds, the registers are non-masked. Up > > > >> until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with the > > > >> set_bits to program the register via the GPU in the WA bb. This has the > > > >> side effect of overwriting the content of the register outside of bits > > > >> that should be set and also doesn't handle the bits that should be > > > >> cleared. > > > >> > > > >> Kenneth reported that on DG2, mesa was seeing a weird behavior due to > > > >> the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With > > > >> the GPU idle, that register could be read via intel_reg as 0x00e001ff, > > > >> but during a 3D workload it would change to 0x0000007f. So the > > > >> programming of that tuning was affecting more than the bits in > > > >> L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the > > > >> context workarounds due to the use of MI_LOAD_REGISTER_IMM. > > > >> > > > >> So, for registers that are not masked, read its value via mmio, modify > > > >> and then set it in the buffer to be written by the GPU. This should take > > > >> care in a simple way of programming just the bits required by the > > > >> tuning/workaround. If in future there are registers that involved that > > > >> can't be read by the CPU, a more complex approach may be required like > > > >> a) issuing additional instructions to read and modify; or b) scan the > > > >> golden context and patch it in place before saving it; or something > > > >> else. But for now this should suffice. > > > >> > > > >> Scanning the context workarounds for all platforms, these are the > > > >> impacted ones with the respective registers > > > >> > > > >> mtl: DRAW_WATERMARK > > > >> mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2 > > > >> gen12: GEN12_FF_MODE2 > > > > > > > >Speaking of GEN12_FF_MODE2...there's a big scary comment above that > > > >workaround write which says that register "will return the wrong value > > > >when read." I think with this patch, we'll start doing a RMW cycle for > > > >the register, which could mix in some of this "wrong value". The > > > >comment mentions that the intention is to write the whole register, > > > >as the default value is 0 for all fields. > > > > > > Good point. That also means we don't need to backport this patch to > > > stable kernel to any gen12, since overwritting the other bits is > > > actually the intended behavior. > > > > > > > > > > >Maybe what we want to do is change gen12_ctx_gt_tuning_init to do > > > > > > > > wa_write(wal, GEN12_FF_MODE2, FF_MODE2_TDS_TIMER_128); > > > > > > > >so it has a clear mask of ~0 instead of FF_MODE2_TDS_TIMER_MASK, and > > > > > > In order to ignore read back when verifying, we would still need to use > > > wa_add(), but changing the mask. We don't have a wa_write() that ends up > > > with { .clr = ~0, .read_mask = 0 }. > > > > > > wa_add(wal, > > > GEN12_FF_MODE2, > > > ~0, FF_MODE2_TDS_TIMER_128, > > > 0, false); > > > > Good point! Though, I just noticed another bug here: > > > > gen12_ctx_workarounds_init sets FF_MODE2_GS_TIMER_224 to avoid hangs > > in the HS/DS unit, after gen12_ctx_gt_tuning_init set TDS_TIMER_128 > > for performance. One of those is going to clobber the other; we're > > likely losing the TDS tuning today. Combining those workarounds into > > we are not losing it today. As long as the wa list is the same, we do detect collisions when > adding workarounds and they are coallesced before applying. However, > indeed if we change this to make clear be ~0, then they will collide and > we will see a warning. > > Applying them together in a single operation would indeed solve it > with a side-effect of moving this back to the workarounds. Either that > or > > a) we handle the read_back == 0 && clear == U32_MAX specially when > adding WAs. If that is true, then the check for collisions can > be adjusted to allow that. > > b) we give up on this approach and proceed with one of > > 1) scan the ctx wa list. If it has any non-masked register, > we submit a job to read it from the GPU side. MCR will > make this harder as the steering from the GPU side is > different than the CPU > > 2) emit additional commands to read and modify the register from > the GPU side > > 3) find the register in the golden context and patch it in place > > > > > > one place seems like an easy way to fix that. > > I'm leaning towards this option in the hope we don't have have > another GEN12_FF_MODE2 in future. > > Matt, we've been pushing towards separating the tuning from the WAs, but > here we'd go the other way. Anything against doing this for now? That's probably fine as long as we leave a comment behind in the tuning section explaining why that specific setting is found in a different spot. Matt > > thanks > Lucas De Marchi > > > > > > >then in this patch update your condition below from > > > > > > > >+ if (wa->masked_reg || wa->set == U32_MAX) { > > > > > > > >to > > > > > > > >+ if (wa->masked_reg || wa->set == U32_MAX || wa->clear == U32_MAX) { > > > > > > yeah... and maybe also warn if wa->read is 0, which means it's one > > > of the registers we can't/shouldn't read from the CPU. > > > > > > > > > > >because if we're clearing all bits then we don't care about doing a > > > >read-modify-write either. > > > > > > thanks > > > Lucas De Marchi > > > > > > > > > > >--Ken > > > > > > > > > > > > > -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation