> -----Original Message----- > From: Sousa, Gustavo <gustavo.sousa@xxxxxxxxx> > Sent: Friday, October 6, 2023 2:57 PM > To: Kahola, Mika <mika.kahola@xxxxxxxxx>; Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: RE: [PATCH] drm/i915/display: Reset message bus after each read/write operation > > Quoting Kahola, Mika (2023-10-06 03:49:15-03:00) > >> -----Original Message----- > >> From: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx> > >> Sent: Thursday, October 5, 2023 7:10 PM > >> To: Sousa, Gustavo <gustavo.sousa@xxxxxxxxx> > >> Cc: Kahola, Mika <mika.kahola@xxxxxxxxx>; > >> intel-gfx@xxxxxxxxxxxxxxxxxxxxx > >> Subject: Re: [PATCH] drm/i915/display: Reset message bus > >> after each read/write operation > >> > >> On Thu, Oct 05, 2023 at 12:40:35PM -0300, Gustavo Sousa wrote: > >> > Quoting Rodrigo Vivi (2023-10-05 12:13:34-03:00) > >> > >On Thu, Oct 05, 2023 at 03:05:31AM -0400, Kahola, Mika wrote: > >> > >> > -----Original Message----- > >> > >> > From: Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx> > >> > >> > Sent: Wednesday, October 4, 2023 3:56 PM > >> > >> > To: Kahola, Mika <mika.kahola@xxxxxxxxx> > >> > >> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > >> > >> > Subject: Re: [PATCH] drm/i915/display: Reset > >> > >> > message bus after each read/write operation > >> > >> > > >> > >> > On Wed, Oct 04, 2023 at 01:25:04PM +0300, Mika Kahola wrote: > >> > >> > > Every know and then we receive the following error when > >> > >> > > running for example IGT test kms_flip. > >> > >> > > > >> > >> > > [drm] *ERROR* PHY G Read 0d80 failed after 3 retries. > >> > >> > > [drm] *ERROR* PHY G Write 0d81 failed after 3 retries. > >> > >> > > > >> > >> > > Since the error is sporadic in nature, the patch proposes to > >> > >> > > reset the message bus after every successful or unsuccessful > >> > >> > > read or write operation. However, testing revealed that this > >> > >> > > alone is not sufficient method an additiona delay is also > >> > >> > > introduces anything from 200us to 300us. This delay is > >> > >> > > experimental value and has no specification to back it up. > >> > >> > > >> > >> > have you tried the delays without the bus_reset? > >> > >> Yes, we have bumped up the delay, first from 0x100 to 0x200 and > >> > >> then as per BSpec change 0xa000 and I have tried 0xf000. > >> > >> Increasing the timeout reduces the frequency of this error but doesn't solve this issue. > >> > > > >> > >what is exactly this BSPec's 0xa000? where can I see it? So maybe > >> > >you can update the message above removing the 'no specification to back it up'. > >> > > >> > (Resending this because I got a delivery failure notification) > >> > > >> > I think we are confusing "delay" with the "timeout parameter" of the msgbus. > >> > > >> > The PHY has a register to control the timeout parameter of msgbus > >> > transactions (BSpec 65156). It's default value is 0x100. With > >> > commit e028d7a4235d > >> > ("drm/i915/cx0: Check and increase msgbus timeout threshold"), we > >> > had integrated a workaround that bumped the timeout value to 0x200 > >> > in case timeouts were observed. Later on, there was a BSpec update > >> > with the formal timeout value to be programmed to 0xa000, which was > >> > incorporated with commit e35628968032 > >> > ("drm/i915/cx0: Add step for programming msgbus timer"). > >> > > >> > I *believe* what Rodrigo has asked was about the usleep_range() > >> > calls added with this patch, if we tried to only keep the usleed_range() without the bus reset. > >> > >> yes, that was my original question. > > > >I have no good explanation why usleep_range() is needed. Without it, > >the kms_flip test eventually throws these read/write failures. As these > >are a bit sporadic in nature, it takes some time to catch these errors. > > I think the question is whether the bus reset is really necessary. Maybe only the usleep_range() hack would be "enough" to > mitigate the issue? I have been scratching my head with this. I tested without reset and left only the delay i.e. usleep_range() in place but still I had these read/write failures. The same thing vice versa. It's like we would need both of them. I would like to understand why or is there something else that we are missing from our sequence? > > -- > Gustavo Sousa > > > > >The patch is a hack and my idea was to set message bus at reset state after each read/write operation. > >Unfortunately, this alone is not enough to pass kms_flip without these dmesg errors on read/write. > >However, the kms_flip test itself, which triggers these, passes without issues. > > > >And I missed to mention that these errors show up (at least more > >frequently) when 2x 4k monitors are connected. These may not be visible > >with only one monitor connected. For such a system, I haven't been testing that much. > > > >-Mika- > > > >> > >> > > >> > -- > >> > Gustavo Sousa > >> > > >> > > > >> > >Oh, and my english is bad, but it looks to me that 'empirical' > >> > >might sound better than 'experimental' for this case, since you > >> > >really did a lot of experiments before coming to this final conclusion. > >> > > > >> > >> > >> > >> > have you talked to hw architects about this? > >> > >> Yes, HW guys requested traces which I provided but based on > >> > >> these the sequence we use in i915 is correct. > >> > >> > >> > >> > > >> > >> > I wonder if we should add the delay inside the bus_reset itself? > >> > >> > although the bit 15 clear check should be enough by itself and > >> > >> > it doesn't look like it is a hw/fw reset involved to justify the extra delay. > >> > >> That should be enough. To me, it looks like when reading/writing > >> > >> to the bus maybe too fast, the hw cannot handle that and we need to reset and let things settle down before trying again. > >> > >> > >> > >> > > >> > >> > well, at least some /* FIXME: */ or /* XXX: */ comments is > >> > >> > desired along with the messages if we are going with this hack without understanding why... > >> > >> True, I will add these the the patch. > >> > >> > >> > >> Thanks for review! > >> > >> > >> > >> -Mika- > >> > >> > > >> > >> > > > >> > >> > > Signed-off-by: Mika Kahola <mika.kahola@xxxxxxxxx> > >> > >> > > --- > >> > >> > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 6 ++++++ > >> > >> > > 1 file changed, 6 insertions(+) > >> > >> > > > >> > >> > > diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c > >> > >> > > b/drivers/gpu/drm/i915/display/intel_cx0_phy.c > >> > >> > > index abd607b564f1..a71b8a29d6b0 100644 > >> > >> > > --- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c > >> > >> > > +++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c > >> > >> > > @@ -220,9 +220,12 @@ static u8 __intel_cx0_read(struct drm_i915_private *i915, enum port port, > >> > >> > > /* 3 tries is assumed to be enough to read successfully */ > >> > >> > > for (i = 0; i < 3; i++) { > >> > >> > > status = __intel_cx0_read_once(i915, port, > >> > >> > > lane, addr); > >> > >> > > + intel_cx0_bus_reset(i915, port, lane); > >> > >> > > > >> > >> > > if (status >= 0) > >> > >> > > return status; > >> > >> > > + > >> > >> > > + usleep_range(200, 300); > >> > >> > > } > >> > >> > > > >> > >> > > drm_err_once(&i915->drm, "PHY %c Read %04x failed > >> > >> > > after %d retries.\n", @@ -299,9 +302,12 @@ static void > >> > >> > > __intel_cx0_write(struct drm_i915_private *i915, enum port > >> port, > >> > >> > > /* 3 tries is assumed to be enough to write successfully */ > >> > >> > > for (i = 0; i < 3; i++) { > >> > >> > > status = __intel_cx0_write_once(i915, port, > >> > >> > > lane, addr, data, committed); > >> > >> > > + intel_cx0_bus_reset(i915, port, lane); > >> > >> > > > >> > >> > > if (status == 0) > >> > >> > > return; > >> > >> > > + > >> > >> > > + usleep_range(200, 300); > >> > >> > > } > >> > >> > > > >> > >> > > drm_err_once(&i915->drm, > >> > >> > > -- > >> > >> > > 2.34.1 > >> > >> > >