Re: [PATCH 2/3] drm/i915/gt: Fix context workarounds with non-masked regs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday, June 23, 2023 8:49:05 AM PDT Lucas De Marchi wrote:
> On Thu, Jun 22, 2023 at 04:37:21PM -0700, Kenneth Graunke wrote:
> >On Thursday, June 22, 2023 11:27:30 AM PDT Lucas De Marchi wrote:
> >> Most of the context workarounds tweak masked registers, but not all. For
> >> masked registers, when writing the value it's sufficient to just write
> >> the wa->set_bits since that will take care of both the clr and set bits
> >> as well as not overwriting other bits.
> >>
> >> However there are some workarounds, the registers are non-masked. Up
> >> until now the driver was simply emitting a MI_LOAD_REGISTER_IMM with the
> >> set_bits to program the register via the GPU in the WA bb. This has the
> >> side effect of overwriting the content of the register outside of bits
> >> that should be set and also doesn't handle the bits that should be
> >> cleared.
> >>
> >> Kenneth reported that on DG2, mesa was seeing a weird behavior due to
> >> the kernel programming of L3SQCREG5 in dg2_ctx_gt_tuning_init(). With
> >> the GPU idle, that register could be read via intel_reg as 0x00e001ff,
> >> but during a 3D workload it would change to 0x0000007f. So the
> >> programming of that tuning was affecting more than the bits in
> >> L3_PWM_TIMER_INIT_VAL_MASK. Matt Roper noticed the lack of rmw for the
> >> context workarounds due to the use of MI_LOAD_REGISTER_IMM.
> >>
> >> So, for registers that are not masked, read its value via mmio, modify
> >> and then set it in the buffer to be written by the GPU. This should take
> >> care in a simple way of programming just the bits required by the
> >> tuning/workaround. If in future there are registers that involved that
> >> can't be read by the CPU, a more complex approach may be required like
> >> a) issuing additional instructions to read and modify; or b) scan the
> >> golden context and patch it in place before saving it; or something
> >> else. But for now this should suffice.
> >>
> >> Scanning the context workarounds for all platforms, these are the
> >> impacted ones with the respective registers
> >>
> >> 	mtl: DRAW_WATERMARK
> >> 	mtl/dg2: XEHP_L3SQCREG5, XEHP_FF_MODE2
> >> 	gen12: GEN12_FF_MODE2
> >
> >Speaking of GEN12_FF_MODE2...there's a big scary comment above that
> >workaround write which says that register "will return the wrong value
> >when read."  I think with this patch, we'll start doing a RMW cycle for
> >the register, which could mix in some of this "wrong value".  The
> >comment mentions that the intention is to write the whole register,
> >as the default value is 0 for all fields.
> 
> Good point. That also means we don't need to backport this patch to
> stable kernel to any gen12, since overwritting the other bits is
> actually the intended behavior.
> 
> >
> >Maybe what we want to do is change gen12_ctx_gt_tuning_init to do
> >
> >    wa_write(wal, GEN12_FF_MODE2, FF_MODE2_TDS_TIMER_128);
> >
> >so it has a clear mask of ~0 instead of FF_MODE2_TDS_TIMER_MASK, and
> 
> In order to ignore read back when verifying, we would still need to use
> wa_add(), but changing the mask. We don't have a wa_write() that ends up
> with { .clr = ~0, .read_mask = 0 }.
> 
> 	wa_add(wal,
> 	       GEN12_FF_MODE2,
> 	       ~0, FF_MODE2_TDS_TIMER_128,
> 	       0, false);

Good point!  Though, I just noticed another bug here:

gen12_ctx_workarounds_init sets FF_MODE2_GS_TIMER_224 to avoid hangs
in the HS/DS unit, after gen12_ctx_gt_tuning_init set TDS_TIMER_128
for performance.  One of those is going to clobber the other; we're
likely losing the TDS tuning today.  Combining those workarounds into
one place seems like an easy way to fix that.

> >then in this patch update your condition below from
> >
> >+		if (wa->masked_reg || wa->set == U32_MAX) {
> >
> >to
> >
> >+		if (wa->masked_reg || wa->set == U32_MAX || wa->clear == U32_MAX) {
> 
> yeah... and maybe also warn if wa->read is 0, which means it's one
> of the registers we can't/shouldn't read from the CPU.
> 
> >
> >because if we're clearing all bits then we don't care about doing a
> >read-modify-write either.
> 
> thanks
> Lucas De Marchi
> 
> >
> >--Ken
> 
> 
> 

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux