Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00) >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote: >> > -----Original Message----- >> > From: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx> >> > Sent: Wednesday, October 4, 2023 3:56 PM >> > To: Kahola, Mika <mika.kahola@xxxxxxxxx> >> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx >> > Subject: Re: [PATCH] drm/i915/display: Reset message bus after each read/write operation >> > >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote: >> > > Every know and then we receive the following error when running for >> > > example IGT test kms_flip. >> > > >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries. >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries. >> > > >> > > Since the error is sporadic in nature, the patch proposes to reset the >> > > message bus after every successful or unsuccessful read or write >> > > operation. However, testing revealed that this alone is not sufficient >> > > method an additiona delay is also introduces anything from 200us to >> > > 300us. This delay is experimental value and has no specification to >> > > back it up. >> > >> > have you tried the delays without the bus_reset? >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and then as per >> BSpec change 0xa000 and I have tried 0xf000. Increasing the timeout reduces >> the frequency of this error but doesn't solve this issue. > >what is exactly this BSPec's 0xa000? where can I see it? So maybe you can >update the message above removing the 'no specification to back it up'. (Resending this because I got a delivery failure notification) I think we are confusing "delay" with the "timeout parameter" of the msgbus. The PHY has a register to control the timeout parameter of msgbus transactions (BSpec 65156). It's default value is 0x100. With commit e028d7a4235d ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we had integrated a workaround that bumped the timeout value to 0x200 in case timeouts were observed. Later on, there was a BSpec update with the formal timeout value to be programmed to 0xa000, which was incorporated with commit e35628968032 ("drm/i915/cx0: Add step for programming msgbus timer"). I *believe* what Rodrigo has asked was about the usleep_range() calls added with this patch, if we tried to only keep the usleed_range() without the bus reset. -- Gustavo Sousa > >Oh, and my english is bad, but it looks to me that 'empirical' might >sound better than 'experimental' for this case, since you really did >a lot of experiments before coming to this final conclusion. > >> >> > have you talked to hw architects about this? >> Yes, HW guys requested traces which I provided but based on these the sequence we use in i915 >> is correct. >> >> > >> > I wonder if we should add the delay inside the bus_reset itself? >> > although the bit 15 clear check should be enough by itself and it doesn't look like it is a hw/fw reset involved to justify the extra >> > delay. >> That should be enough. To me, it looks like when reading/writing to the bus maybe too fast, the hw cannot handle that and we need >> to reset and let things settle down before trying again. >> >> > >> > well, at least some /* FIXME: */ or /* XXX: */ comments is desired along with the messages if we are going with this hack without >> > understanding why... >> True, I will add these the the patch. >> >> Thanks for review! >> >> -Mika- >> > >> > > >> > > Signed-off-by: Mika Kahola <mika.kahola@xxxxxxxxx> >> > > --- >> > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++++++ >> > > 1 file changed, 6 insertions(+) >> > > >> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c >> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c >> > > index abd607b564f1..a71b8a29d6b0 100644 >> > > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c >> > > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c >> > > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port, >> > > /* 3 tries is assumed to be enough to read successfully */ >> > > for (i = 0; i < 3; i++) { >> > > status = __intel_cx0_read_once(i915, port, lane, addr); >> > > + intel_cx0_bus_reset(i915, port, lane); >> > > >> > > if (status >= 0) >> > > return status; >> > > + >> > > + usleep_range(200, 300); >> > > } >> > > >> > > drm_err_once(&i915->drm, "PHY %c Read %04x failed after %d >> > > retries.\n", @@ -299,9 +302,12 @@ static void __intel_cx0_write(struct drm_i915_private *i915, enum port port, >> > > /* 3 tries is assumed to be enough to write successfully */ >> > > for (i = 0; i < 3; i++) { >> > > status = __intel_cx0_write_once(i915, port, lane, addr, data, >> > > committed); >> > > + intel_cx0_bus_reset(i915, port, lane); >> > > >> > > if (status == 0) >> > > return; >> > > + >> > > + usleep_range(200, 300); >> > > } >> > > >> > > drm_err_once(&i915->drm, >> > > -- >> > > 2.34.1 >> > >