Re: [bug report] nvme not re-recognize when changed M.2 SSD in S3 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[cc += Mika]

On Tue, Apr 09, 2024 at 05:44:38AM +0000, Ricky WU wrote:
> I find some problem in S3 mode maybe S2idle has the same situation.
> The current situation is: Enter S3 mode I unplugged a correctly recognized
> Kingston A2000 250GB and switched it for a Intel 760p 256GB, when back to
> S0 there is also Kingston A2000 still installed.
> It did not call pciehp_ist(), pciehp_handle_presence_or_link_change() when
> back to S0, I don't know if this is the reason.

IIUC, you've got a regular PC with an ASRock H370M Pro 4 mainboard
which you suspended to S3, replaced the SSD and resumed.

And your point is that Linux doesn't notice the SSD was changed
during suspend.

For comparison, I think with Thunderbolt, a hotplug event is signaled
even during system sleep.  That's probably not possible on a Root Port
that's not powered in S3, despite it being hotplug-capable.

My guess is that S0ix would indeed behave differently here.  But it's
probably not safe to replace the SSD while the system is powered.

What we could do is check the Vendor ID and Device ID of the device
in the slot and compare it to what's cached in its struct pci_dev.
Below is a patch which does that.  Does it fix the issue for you?

This is just a heuristic, a poor man's way to detect a device change.

We could try to improve it by also checking the Subsystem Vendor ID
and Device ID, but that's only present in a Type 0 Config Space Header.
We could also try to check the Class Code and Revision ID, but it's
doubtful whether that's much of an improvement.
There's also the Device Serial Number, but it's optional and we're
not caching it right now.

If the device was replaced with an identical one (same Vendor ID and
Device ID), it's probably fine not to re-enumerate it.  If it was
indeed powered off (which is likely if the Root Port was powered off
as well), its driver will re-initialize it on resume and it will be
just as if it wasn't replaced.

Conceivably, the device driver might apply quirks based on the
Revision ID, Subsystem Vendor / Device ID or something else.
In that case it may handle the device incorrectly after resume
because it's not re-enumerated.  Again, this patch is just a
heuristic but probably an improvement on the status quo.

Anyway, here's the patch:

-- >8 --

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ddd55ad..ff19985 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -152,6 +152,25 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
 	return 0;
 }
 
+static bool pciehp_device_replaced(struct controller *ctrl)
+{
+	struct pci_dev *pdev;
+	u32 reg;
+
+	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
+	if (!pdev)
+		return true;
+
+	if (!pci_bus_read_dev_vendor_id(ctrl->pcie->port->subordinate,
+					PCI_DEVFN(0, 0), &reg, 0))
+		return true;
+
+	if (reg != (pdev->vendor | (pdev->device << 16)))
+		return true;
+
+	return false;
+}
+
 /**
  * pciehp_check_presence() - synthesize event if presence has changed
  * @ctrl: controller to check
@@ -172,7 +191,8 @@ static void pciehp_check_presence(struct controller *ctrl)
 
 	occupied = pciehp_card_present_or_link_active(ctrl);
 	if ((occupied > 0 && (ctrl->state == OFF_STATE ||
-			  ctrl->state == BLINKINGON_STATE)) ||
+			      ctrl->state == BLINKINGON_STATE ||
+			      pciehp_device_replaced(ctrl))) ||
 	    (!occupied && (ctrl->state == ON_STATE ||
 			   ctrl->state == BLINKINGOFF_STATE)))
 		pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux