Patch "cxl: Remove racy attempt to force EEH invocation in reset" has been added to the 4.2-stable tree

<gregkh@xxxxxxxxxxxxxxxxxxx> · Tue, 22 Sep 2015 21:14:54 -0700

This is a note to let you know that I've just added the patch titled

    cxl: Remove racy attempt to force EEH invocation in reset

to the 4.2-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     cxl-remove-racy-attempt-to-force-eeh-invocation-in-reset.patch
and it can be found in the queue-4.2 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


>From 9d8e27673c45927fee9e7d8992ffb325a6b0b0e4 Mon Sep 17 00:00:00 2001
From: Daniel Axtens <dja@xxxxxxxxxx>
Date: Fri, 21 Aug 2015 17:25:15 +1000
Subject: cxl: Remove racy attempt to force EEH invocation in reset

From: Daniel Axtens <dja@xxxxxxxxxx>

commit 9d8e27673c45927fee9e7d8992ffb325a6b0b0e4 upstream.

cxl_reset currently PERSTs the slot, and then repeatedly tries to
read MMIO space in order to kick off EEH.

There are 2 problems with this: it's unnecessary, and it's racy.

It's unnecessary because the PERST will bring down the PHB link.
That will be picked up by the CAPP, which will send out an HMI.
Skiboot, noticing an HMI from the CAPP, will send an OPAL
notification to the kernel, which will trigger EEH recovery.

It's also racy: the EEH recovery triggered by the CAPP will
eventually cause the MMIO space to have its mapping invalidated
and the pointer NULLed out. This races with our attempt to read
the MMIO space. This is causing OOPSes in testing.

Simply drop all the attempts to force EEH detection, and trust
that Skiboot will send the notification and that we'll act on it.
The Skiboot code to send the EEH notification has been in Skiboot
for as long as CAPP recovery has been supported, so we don't need
to worry about breaking obscure setups with ancient firmware.

Cc: Ryan Grimm <grimm@xxxxxxxxxxxxxxxxxx>
Fixes: 62fa19d4b4fd ("cxl: Add ability to reset the card")
Signed-off-by: Daniel Axtens <dja@xxxxxxxxxx>
Acked-by: Ian Munsie <imunsie@xxxxxxxxxxx>
Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
 drivers/misc/cxl/pci.c |   16 ----------------
 1 file changed, 16 deletions(-)

--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -851,8 +851,6 @@ int cxl_reset(struct cxl *adapter)
 {
 	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
 	int rc;
-	int i;
-	u32 val;
 
 	dev_info(&dev->dev, "CXL reset\n");
 
@@ -869,20 +867,6 @@ int cxl_reset(struct cxl *adapter)
 		return rc;
 	}
 
-	/* the PERST done above fences the PHB.  So, reset depends on EEH
-	 * to unbind the driver, tell Sapphire to reinit the PHB, and rebind
-	 * the driver.  Do an mmio read explictly to ensure EEH notices the
-	 * fenced PHB.  Retry for a few seconds before giving up. */
-	i = 0;
-	while (((val = mmio_read32be(adapter->p1_mmio)) != 0xffffffff) &&
-		(i < 5)) {
-		msleep(500);
-		i++;
-	}
-
-	if (val != 0xffffffff)
-		dev_err(&dev->dev, "cxl: PERST failed to trigger EEH\n");
-
 	return rc;
 }
 


Patches currently in stable-queue which might be from dja@xxxxxxxxxx are

queue-4.2/cxl-remove-racy-attempt-to-force-eeh-invocation-in-reset.patch
queue-4.2/cxl-fix-unbalanced-pci_dev_get-in-cxl_probe.patch
queue-4.2/cxl-allow-release-of-contexts-which-have-been-opened-but-not-started.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html