Hi Lukas, We use SDEX card replace M.2 nvme card because we don't have enough M.2 nvme card We tried this patch as below situation: 1.keep the SDEX card enter S3 then resume ->PASS the video can continue playing, not see the msg "device replace during system sleep" 2. on S3 mode insert the SDEX card then resume -> PASS Can identify the card and can see the msg "device replace during system sleep" 3.on S3 mode remove the SDEX card then resume -> PASS No card show on system and can see the msg "device replace during system sleep" 4.replace card on S3 mode (different brand) ->PASS Can identify the second card and can see the msg "device replace during system sleep" 5.replace card on S3 mode (same brand and same capacity) ->FAIL Can not see the msg "device replace during system sleep" Against 5 we found a issue, most card not declare capability "PCI_EXT_CAP_ID_DSN", even if there is the capability the config space value are 0, cause pci_get_dsn() return 0 normally. However these cards can show the SN on CrystalDiskInfo. below is the card pci config space log and lsblk disk info. Sandisk Corp Device 5007 (prog-if 02 [NVM Express]) this card can see the "PCI_EXT_CAP_ID_DSN" capability But value is 0 -------------------------------------------------------------------------------------------------------------------- cr@cr-desktop:~$ sudo lspci -s 02:00.0 -vvv 02:00.0 Non-Volatile memory controller: Sandisk Corp Device 5007 (prog-if 02 [NVM Express]) Subsystem: Sandisk Corp Device 5007 Physical Slot: 8 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at a3b00000 (64-bit, non-prefetchable) [size=16K] Region 4: Memory at a3b04000 (64-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [b0] MSI-X: Enable+ Count=17 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=4 offset=00000000 Capabilities: [c0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <8us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range B, TimeoutDis+, NROPrPrP-, LTR+ 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [1b8 v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Capabilities: [300 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn-, PerformEqu- LaneErrStat: 0 Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=90112ns L1SubCtl2: T_PwrOn=44us Kernel driver in use: nvme Kernel modules: nvme -------------------------------------------------------------------------------------------------------------------- Here can see the Serial Number is "03969d74164000000000" -------------------------------------------------------------------------------------------------------------------- cr@cr-desktop:~$ lsblk --nodeps -o name,size,type,mountpoint,serial NAME SIZE TYPE MOUNTPOINT SERIAL loop0 4K loop /snap/bare/5 loop1 55.7M loop /snap/core18/2785 loop2 55.7M loop /snap/core18/2790 loop3 63.5M loop /snap/core20/1974 loop4 91.7M loop /snap/gtk-common-themes/1535 loop5 53.3M loop /snap/snapd/19457 loop6 485.5M loop /snap/gnome-42-2204/120 loop7 65.1M loop /snap/gtk-common-themes/1515 loop8 73.9M loop /snap/core22/858 loop9 12.3M loop /snap/snap-store/959 loop10 485.5M loop /snap/gnome-42-2204/126 loop11 219M loop /snap/gnome-3-34-1804/72 loop12 51M loop /snap/snap-store/547 loop13 218.4M loop /snap/gnome-3-34-1804/93 loop14 63.9M loop /snap/sublime-text/122 loop15 321.1M loop /snap/vlc/3721 loop16 65.2M loop /snap/sublime-text/118 sda 232.9G disk 210350450505 nvme0n1 236.1G disk 03969d74164000000000 -------------------------------------------------------------------------------------------------------------------- we tried the 4 SDEX cards and 2 M.2 Nvme, either it is not declared or the value is 0 all the cards I have. RIcky > Below please find a reworked patch which checks the Device Serial Number in > addition to various other device identity information in config space. > > I've moved the check for a replaced device to the ->resume_noirq phase, i.e. > before the device driver's first access to the device. > I'm also marking the device removed to prevent the driver from accessing it. > > Does this fix the issue for you? > > If it does reliably detect a replaced device, could you also double-check the > opposite case, i.e. if the device is *not* replaced during system sleep? > > I think this is about as much as we can do at the PCI layer to detect a replaced > device. Drivers may have additional ways to identify a device (such as > reading a WWID from some register) and we could consider providing a library > function which drivers could call if they detect a replaced device on resume. > > If you set CONFIG_DYNAMIC_DEBUG=y and add the following to the command > line... > > pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" log_buf_len=10M > ignore_loglevel > > ...you should see "device replaced during system sleep" messages on resume if > a replaced device is detected. > > -- >8 -- > > From: Lukas Wunner <lukas@xxxxxxxxx> > Subject: [PATCH] PCI: pciehp: Detect device replacement during system sleep > > Ricky reports that replacing a device in a hotplug slot during ACPI sleep state > S3 does not cause re-enumeration on resume, as one would expect. Instead, > the new device is treated as if it was the old one. > > There is no bulletproof way to detect device replacement, but as a heuristic, > check whether the device identity in config space matches cached data in > struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID, Subsystem > Vendor ID, Subsystem ID). Additionally, cache and compare the Device Serial > Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected, mark the old device > disconnected (to prevent its driver from accessing the new device) and > synthesize a Presence Detect Changed event. > > Reported-by: Ricky Wu <ricky_wu@xxxxxxxxxxx> > Closes: > https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@xxxxxxxxxxx > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > --- > drivers/pci/hotplug/pciehp.h | 4 ++++ > drivers/pci/hotplug/pciehp_core.c | 42 > ++++++++++++++++++++++++++++++++++++++- > drivers/pci/hotplug/pciehp_hpc.c | 5 +++++ > drivers/pci/hotplug/pciehp_pci.c | 4 ++++ > 4 files changed, 54 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index > e0a614a..273dd8c 100644 > --- a/drivers/pci/hotplug/pciehp.h > +++ b/drivers/pci/hotplug/pciehp.h > @@ -46,6 +46,9 @@ > /** > * struct controller - PCIe hotplug controller > * @pcie: pointer to the controller's PCIe port service device > + * @dsn: cached copy of Device Serial Number of Function 0 in the hotplug > slot > + * (PCIe r6.2 sec 7.9.3); used to determine whether a hotplugged device > + * was replaced with a different one during system sleep > * @slot_cap: cached copy of the Slot Capabilities register > * @inband_presence_disabled: In-Band Presence Detect Disable supported > by > * controller and disabled per spec recommendation (PCIe r5.0, > appendix I > @@ -87,6 +90,7 @@ > */ > struct controller { > struct pcie_device *pcie; > + u64 dsn; > > u32 slot_cap; /* capabilities and > quirks */ > unsigned int inband_presence_disabled:1; diff --git > a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c > index ddd55ad..ff458e6 100644 > --- a/drivers/pci/hotplug/pciehp_core.c > +++ b/drivers/pci/hotplug/pciehp_core.c > @@ -284,6 +284,32 @@ static int pciehp_suspend(struct pcie_device *dev) > return 0; > } > > +static bool pciehp_device_replaced(struct controller *ctrl) { > + struct pci_dev *pdev __free(pci_dev_put); > + u32 reg; > + > + pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0)); > + if (!pdev) > + return true; > + > + if (pci_read_config_dword(pdev, PCI_VENDOR_ID, ®) || > + reg != (pdev->vendor | (pdev->device << 16)) || > + pci_read_config_dword(pdev, PCI_CLASS_REVISION, ®) || > + reg != (pdev->revision | (pdev->class << 8))) > + return true; > + > + if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL && > + (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID, > ®) || > + reg != (pdev->subsystem_vendor | (pdev->subsystem_device > << 16)))) > + return true; > + > + if (pci_get_dsn(pdev) != ctrl->dsn) > + return true; > + > + return false; > +} > + > static int pciehp_resume_noirq(struct pcie_device *dev) { > struct controller *ctrl = get_service_data(dev); @@ -293,9 +319,23 > @@ static int pciehp_resume_noirq(struct pcie_device *dev) > ctrl->cmd_busy = true; > > /* clear spurious events from rediscovery of inserted card */ > - if (ctrl->state == ON_STATE || ctrl->state == BLINKINGOFF_STATE) > + if (ctrl->state == ON_STATE || ctrl->state == BLINKINGOFF_STATE) > + { > pcie_clear_hotplug_events(ctrl); > > + /* > + * If hotplugged device was replaced with a different one > + * during system sleep, mark the old device disconnected > + * (to prevent its driver from accessing the new device) > + * and synthesize a Presence Detect Changed event. > + */ > + if (pciehp_device_replaced(ctrl)) { > + ctrl_dbg(ctrl, "device replaced during system > sleep\n"); > + pci_walk_bus(ctrl->pcie->port->subordinate, > + pci_dev_set_disconnected, > NULL); > + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); > + } > + } > + > return 0; > } > #endif > diff --git a/drivers/pci/hotplug/pciehp_hpc.c > b/drivers/pci/hotplug/pciehp_hpc.c > index b1d0a1b3..061f01f 100644 > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -1055,6 +1055,11 @@ struct controller *pcie_init(struct pcie_device > *dev) > } > } > > + pdev = pci_get_slot(subordinate, PCI_DEVFN(0, 0)); > + if (pdev) > + ctrl->dsn = pci_get_dsn(pdev); > + pci_dev_put(pdev); > + > return ctrl; > } > > diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c > index ad12515..65e50be 100644 > --- a/drivers/pci/hotplug/pciehp_pci.c > +++ b/drivers/pci/hotplug/pciehp_pci.c > @@ -72,6 +72,10 @@ int pciehp_configure_device(struct controller *ctrl) > pci_bus_add_devices(parent); > down_read_nested(&ctrl->reset_lock, ctrl->depth); > > + dev = pci_get_slot(parent, PCI_DEVFN(0, 0)); > + ctrl->dsn = pci_get_dsn(dev); > + pci_dev_put(dev); > + > out: > pci_unlock_rescan_remove(); > return ret; > -- > 2.43.0