Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus

"Kahola, Mika" <mika.kahola@xxxxxxxxx> · Fri, 3 Nov 2023 15:00:30 +0000

> -----Original Message-----
> From: Kahola, Mika <mika.kahola@xxxxxxxxx>
> Sent: Friday, November 3, 2023 4:47 PM
> To: Kahola, Mika <mika.kahola@xxxxxxxxx>; Sousa, Gustavo <gustavo.sousa@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Nikula, Jani <jani.nikula@xxxxxxxxx>; Syrjala, Ville <ville.syrjala@xxxxxxxxx>
> Subject: RE: [PATCH] drm/i915/mtl: Clear possible sticky bits on PICA message bus
> 
> > -----Original Message-----
> > From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> > Kahola, Mika
> > Sent: Thursday, November 2, 2023 4:54 PM
> > To: Sousa, Gustavo <gustavo.sousa@xxxxxxxxx>;
> > intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: Nikula, Jani <jani.nikula@xxxxxxxxx>; Syrjala, Ville
> > <ville.syrjala@xxxxxxxxx>
> > Subject: Re:  [PATCH] drm/i915/mtl: Clear possible sticky
> > bits on PICA message bus
> >
> > > -----Original Message-----
> > > From: Sousa, Gustavo <gustavo.sousa@xxxxxxxxx>
> > > Sent: Thursday, November 2, 2023 4:23 PM
> > > To: Kahola, Mika <mika.kahola@xxxxxxxxx>;
> > > intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > > Cc: Kahola, Mika <mika.kahola@xxxxxxxxx>; Nikula, Jani
> > > <jani.nikula@xxxxxxxxx>; Syrjala, Ville <ville.syrjala@xxxxxxxxx>
> > > Subject: Re: [PATCH] drm/i915/mtl: Clear possible sticky bits on
> > > PICA message bus
> > >
> > > Quoting Mika Kahola (2023-11-01 07:31:01-03:00)
> > > >It is possible that sticky bits or error bits are left on message
> > > >bus status register. Reading and then writing the value back to
> > > >messagebus status register clears all possible sticky bits and errors.
> > > >
> > > >Signed-off-by: Mika Kahola <mika.kahola@xxxxxxxxx>
> > > >---
> > > > drivers/gpu/drm/i915/display/intel_cx0_phy.c | 14 ++++++++++++++
> > > > 1 file changed, 14 insertions(+)
> > > >
> > > >diff --git a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >index b2ad4c6172f6..f439f0c7b400 100644
> > > >--- a/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >+++ b/drivers/gpu/drm/i915/display/intel_cx0_phy.c
> > > >@@ -195,6 +195,13 @@ static int __intel_cx0_read_once(struct drm_i915_private *i915, enum port port,
> > > >                 return -ETIMEDOUT;
> > > >         }
> > > >
> > > >+        /*
> > > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > > >+         * any error sticky bits set from previous transactions
> > > >+         */
> > > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > > >+ lane), val);
> > > >+
> > > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > > >                        XELPDP_PORT_M2P_COMMAND_READ | @@ -262,6
> > > >+269,13 @@ static int __intel_cx0_write_once(struct
> > > >+drm_i915_private *i915, enum port port,
> > > >                 return -ETIMEDOUT;
> > > >         }
> > > >
> > > >+        /*
> > > >+         * write XELPDP_PORT_P2M_MSGBUS_STATUS register after read to clear
> > > >+         * any error sticky bits set from previous transactions
> > > >+         */
> > > >+        val = intel_de_read(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port, lane));
> > > >+        intel_de_write(i915, XELPDP_PORT_P2M_MSGBUS_STATUS(port,
> > > >+ lane), val);
> > > >+
> > >
> > > Looking at the current state of the code, looks like to me that we already clear the bits from both the "success" and "failure"
> > paths.
> > > For the "success"
> > > paths, that is done by a direct call to
> > > intel_clear_response_ready_flag(); for the "failure" case, the call
> > > to
> > > intel_clear_response_ready_flag() is done as part of intel_cx0_bus_reset().
> > >
> > > Thus, considering that we start using the msgbus from a clean state,
> > > maybe these extra steps are not necessary? Have you tried adding a
> > > call to
> > > intel_cx0_bus_reset() as part of intel_cx0_phy_transaction_begin()?
> > That I haven't try to reset bus at the stage. I can give it a go and
> > check what happens. To me it seems, that we are sometimes stuck at
> > waiting the ack and eventually we time out and bail out. I have no
> > clue why this happens from time to time. We already reset the bus after every read/write operation. In addition, a small delay
> seem to help and clear the sporadic read/write failures to the bus. However, this is more like a workaround and I'm not really
> happy about this sort of hack.
> >
> > I will give a go for your suggestion and come back once I have the results.
> 
> I ran a test with intel_cx0_bus_reset() when placed in intel_cx0_phy_transaction_begin(). This didn't help either as with kms_flip
> IGT test I was able to trigger this read failure
> 
> [drm] *ERROR* PHY G Read 0d80 failed after 3 retries.
> 
> This was with the configuration where the test vehicle had 4K and eDP monitors connected.

I think we can ignore this patch. I was able to trigger this error with this patch applied as well. This doesn't fix the issue either.

Sorry for the noise.

-Mika-

> 
> 
> >
> > Thanks!
> > -Mika-
> >
> > >
> > > Also, I think it would be good if we understood better were those uncleared bits are coming from...
> > >
> > > --
> > > Gustavo Sousa
> > >
> > > >         intel_de_write(i915, XELPDP_PORT_M2P_MSGBUS_CTL(port, lane),
> > > >                        XELPDP_PORT_M2P_TRANSACTION_PENDING |
> > > >                        (committed ? XELPDP_PORT_M2P_COMMAND_WRITE_COMMITTED :
> > > >--
> > > >2.34.1
> > > >