[cc -= unrelated mailing lists bpf, kernel-hardening] On Tue, Oct 17, 2023 at 10:25:39AM +0000, Ricky WU wrote: > > On Mon, Oct 16, 2023 at 12:01:31PM +0800, Kai-Heng Feng wrote: > > > When inserting an SD7.0 card to Realtek card reader, it can trigger > > > PCI slot Link down and causes the following error: > > > > Why does *inserting* a card cause a Link Down? > > > > > > > [ 63.898861] pcieport 0000:00:1c.0: pciehp: Slot(8): Link Down > > > [ 63.912118] BUG: unable to handle page fault for address: > > ffffb24d403e5010 > > [...] > > > [ 63.912198] ? asm_exc_page_fault+0x27/0x30 > > > [ 63.912203] ? ioread32+0x2e/0x70 > > > [ 63.912206] ? rtsx_pci_write_register+0x5b/0x90 [rtsx_pci] > > > [ 63.912217] rtsx_set_l1off_sub+0x1c/0x30 [rtsx_pci] > > > [ 63.912226] rts5261_set_l1off_cfg_sub_d0+0x36/0x40 [rtsx_pci] > > > [ 63.912234] rtsx_pci_runtime_idle+0xc7/0x160 [rtsx_pci] > > > [ 63.912243] ? __pfx_pci_pm_runtime_idle+0x10/0x10 > > > [ 63.912246] pci_pm_runtime_idle+0x34/0x70 > > > [ 63.912248] rpm_idle+0xc4/0x2b0 > > > [ 63.912251] pm_runtime_work+0x93/0xc0 > > > [ 63.912254] process_one_work+0x21a/0x430 > > > [ 63.912258] worker_thread+0x4a/0x3c0 > > > > This looks like pcr->remap_addr is accessed after it has been iounmap'ed in > > rtsx_pci_remove() or before it has been iomap'ed in rtsx_pci_probe(). > > > > Is the card reader itself located below a hotplug port and unplugged here? > > Yes it is card reader unplug itself, because sd7.0 card is not used > rtsx_pcr, it use nvme driver I can only guess here as the dmesg and lspci output I requested hasn't been provided: I assume that this card reader may contain a PCIe switch with a regular SD card reader below a first Downstream Port (which is hotplug-capable). If an SD express card is inserted, the regular SD card reader disappears from the Downstream Port and the inserted card appears as an NVMe drive (possibly below a second Downstream Port, or below the first Downstream Port). The commit message is somewhat misleading in that it links the unhandled page fault to card insertion. That may be the trigger, but the root cause appears to be that a runtime PM request is performed asynchronously after the SD card reader has iounmap'ed pcr->remap_addr. If that is indeed the root cause (as I suspect), the fix needs to be placed somewhere else. pciehp is only one of several hotplug drivers and fixing this only in pciehp may leave the other hotplug drivers vulnerable to the same issue. Unfortunately the information provided so far is insufficient to recommend a better fix. It would be necessary to debug why the asynchronous RPM request is not canceled even though rtsx_pci_remove() runtime resumes the device before iounmap'ing pcr->remap_addr. Perhaps there's a bug in runtime PM code that causes asynchronous requests to not be canceled upon a pm_runtime_get_sync()? Or perhaps a wmb() is necessary in pci_device_remove() after setting pci_dev->driver = NULL? Thanks, Lukas