On Wed, Oct 11, 2023 at 02:01:56PM +0300, Tomas Winkler wrote: > From: Alexander Usyskin <alexander.usyskin@xxxxxxxxx> > > Disable and enable mei-pxp client on errors to clean the internal state. This broke i915 on my Alderlake-P laptop. Trying to start Xorg just hangs and I eventually have to power off the laptop to get things back into shape. The behaviour gets a bit better after commit fb99e79ee62a ("mei: update mei-pxp's component interface with timeouts") as Xorg "only" gets blocked for ~10 seconds, after which it manages to start, and I get a bunch of spew in dmesg: [ 25.431535] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 30.435241] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 30.435965] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 30.437341] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 30.437356] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28] [ 35.555210] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 35.555919] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 35.555937] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg init arb session, ret=[-62] [ 35.555941] i915 0000:00:02.0: [drm] *ERROR* tee cmd for arb session creation failed [ 35.556765] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 36.021808] fuse: init (API version 7.39) [ 40.675183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 40.676045] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 40.676591] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 40.676602] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28] [ 40.960209] mate-session-ch[5936]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set [ 45.795172] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 45.795872] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 45.796520] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 50.915183] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 50.916005] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 50.916012] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[-62] [ 50.916846] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 56.035149] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... [ 56.035956] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 56.036585] i915 0000:00:02.0: [drm] *ERROR* Failed to send PXP TEE message [ 56.036592] i915 0000:00:02.0: [drm] *ERROR* Failed to send tee msg for inv-stream-key-15, ret=[28] [ 61.155137] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: Trying to reset the channel... The same spew repeats every time I run any application that uses the GPU, and the application also gets blocked for a long time (eg. firefox takes over 15 seconds to start now). > > Signed-off-by: Alexander Usyskin <alexander.usyskin@xxxxxxxxx> > Signed-off-by: Tomas Winkler <tomas.winkler@xxxxxxxxx> > --- > drivers/misc/mei/pxp/mei_pxp.c | 70 +++++++++++++++++++++++----------- > 1 file changed, 48 insertions(+), 22 deletions(-) > > diff --git a/drivers/misc/mei/pxp/mei_pxp.c b/drivers/misc/mei/pxp/mei_pxp.c > index c6cdd6a47308ebcc72f34c38..9875d16445bb03efcfb31cd9 100644 > --- a/drivers/misc/mei/pxp/mei_pxp.c > +++ b/drivers/misc/mei/pxp/mei_pxp.c > @@ -23,6 +23,24 @@ > > #include "mei_pxp.h" > > +static inline int mei_pxp_reenable(const struct device *dev, struct mei_cl_device *cldev) > +{ > + int ret; > + > + dev_warn(dev, "Trying to reset the channel...\n"); > + ret = mei_cldev_disable(cldev); > + if (ret < 0) > + dev_warn(dev, "mei_cldev_disable failed. %d\n", ret); > + /* > + * Explicitly ignoring disable failure, > + * enable may fix the states and succeed > + */ > + ret = mei_cldev_enable(cldev); > + if (ret < 0) > + dev_err(dev, "mei_cldev_enable failed. %d\n", ret); > + return ret; > +} > + > /** > * mei_pxp_send_message() - Sends a PXP message to ME FW. > * @dev: device corresponding to the mei_cl_device > @@ -35,6 +53,7 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size) > { > struct mei_cl_device *cldev; > ssize_t byte; > + int ret; > > if (!dev || !message) > return -EINVAL; > @@ -44,10 +63,20 @@ mei_pxp_send_message(struct device *dev, const void *message, size_t size) > byte = mei_cldev_send(cldev, message, size); > if (byte < 0) { > dev_dbg(dev, "mei_cldev_send failed. %zd\n", byte); > - return byte; > + switch (byte) { > + case -ENOMEM: > + fallthrough; > + case -ENODEV: > + fallthrough; > + case -ETIME: > + ret = mei_pxp_reenable(dev, cldev); > + if (ret) > + byte = ret; > + break; > + } > } > > - return 0; > + return byte; > } > > /** > @@ -63,6 +92,7 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size) > struct mei_cl_device *cldev; > ssize_t byte; > bool retry = false; > + int ret; > > if (!dev || !buffer) > return -EINVAL; > @@ -73,26 +103,22 @@ mei_pxp_receive_message(struct device *dev, void *buffer, size_t size) > byte = mei_cldev_recv(cldev, buffer, size); > if (byte < 0) { > dev_dbg(dev, "mei_cldev_recv failed. %zd\n", byte); > - if (byte != -ENOMEM) > - return byte; > - > - /* Retry the read when pages are reclaimed */ > - msleep(20); > - if (!retry) { > - retry = true; > - goto retry; > - } else { > - dev_warn(dev, "No memory on data receive after retry, trying to reset the channel...\n"); > - byte = mei_cldev_disable(cldev); > - if (byte < 0) > - dev_warn(dev, "mei_cldev_disable failed. %zd\n", byte); > - /* > - * Explicitly ignoring disable failure, > - * enable may fix the states and succeed > - */ > - byte = mei_cldev_enable(cldev); > - if (byte < 0) > - dev_err(dev, "mei_cldev_enable failed. %zd\n", byte); > + switch (byte) { > + case -ENOMEM: > + /* Retry the read when pages are reclaimed */ > + msleep(20); > + if (!retry) { > + retry = true; > + goto retry; > + } > + fallthrough; > + case -ENODEV: > + fallthrough; > + case -ETIME: > + ret = mei_pxp_reenable(dev, cldev); > + if (ret) > + byte = ret; > + break; > } > } > > -- > 2.41.0 > -- Ville Syrjälä Intel