> From: Ethan Zhao <haifeng.zhao@xxxxxxxxxxxxxxx> > Sent: Tuesday, January 30, 2024 5:13 PM > > On 1/30/2024 4:43 PM, Tian, Kevin wrote: > >> From: Ethan Zhao <haifeng.zhao@xxxxxxxxxxxxxxx> > >> Sent: Tuesday, January 30, 2024 4:16 PM > >> > >> On 1/30/2024 2:22 PM, Tian, Kevin wrote: > >>> Here we need consider two situations. > >>> > >>> One is that the device is not bound to a driver or bound to a driver > >>> which doesn't do active work to the device when it's removed. In > >>> that case one may observe the timeout situation only in the removal > >>> path as the stack dump in your patch02 shows. > >> When iommu_bus_notifier() got called for hotplug removal cases to > >> flush devTLB (ATS invalidation), driver was already unloaded. > >> whatever safe removal or surprise removal. so in theory no active > >> driver working there. > >> > >> pciehp_ist() > >> pciehp_disable_slot() > >> remove_board() > >> pciehp_unconfigure_device() > >> pci_stop_and_remove_bus_device() > >> pci_stop_bus_device()--->here unload driver > >> pci_remove_bus_device()->here qi_flush_dev_iotlb() got called. > > yes, so patch02 can fix this case. > > > >>> patch02 can fix that case by checking whether the device is present > >>> to skip sending the invalidation requests. So the logic being discussed > >>> here doesn't matter. > >>> > >>> The 2nd situation is more tricky. The device might be bound to > >>> a driver which is doing active work to the device with in-fly > >>> ATS invalidation requests. In this case in-fly requests must be aborted > >>> before the driver can be detached from the removed device. > Conceptually > >>> a device is removed from the bus only after its driver is detached. > >> Some tricky situations: > >> > >> 1. The ATS invalidation request is issued from driver driver, while it is > >> in handling, device is removed. this momment, the device instance still > >> exists in the bus list. yes, if searching it by BDF, could get it. > > it's searchable between the point where the device is removed and the > > point where the driver is unloaded: > > > > CPU0 CPU1 > > (Driver is active) (pciehp handler) > > qi_submit_sync() pciehp_ist() > > ... ... > > loop for completion() { pciehp_unconfigure_device() > > ... pci_dev_set_disconnected() > > if (ITE) { ... > > //find pci_dev from sid pci_remove_bus_device() > > if (pci_dev_is_connected()) device_del() > > break; bus_remove_device() > > } device_remove_driver() > > If the device was hot plugin or re-scanned, the device has a PCI_DEV_ADDED > flag, in this case is pci_dev_is_disconnected() true or false? how is this patch supposed to work with it? > if so the driver unloading work isn't defered to the tail of device_del(), it > is unloaded before pci_remove_bus_device()->device_del(), in pci_stop_dev > > pci_stop_bus_device() > pci_stop_dev() > { > if (pci_dev_is_added(dev)) { > device_release_driver(&dev->dev); > } no matter where driver unload is requested, it needs to wait for aborting in-fly request on CPU0. > > So the interval the device is searchable, only applied to those devices > not hot plugged, or never be scanned. > and in the worst case even if pci_dev is not searchable, isn't it already an indicator that the device is absent then qi_submit_sync() should just exit upon ITE?