On Tue, Mar 19, 2024 at 08:50:11AM -0600, Alex Williamson wrote: > On Tue, 19 Mar 2024 12:04:09 +0000 > Mikhail Malyshev <mike.malyshev@xxxxxxxxx> wrote: > > > When trying to run a VM with PCI passthrough of intel-eth-pci ETH device > > QEMU fails with "Permission denied" error. This happens only if > > intel-eth-pci driver is dynamically unbound from the device using > > "echo -n $DEV > /sys/bus/pci/drivers/stmmac/unbind" command. If > > "vfio-pci.ids=..." is used to bind the device to vfio-pci driver and the > > device is never probed by intel-eth-pci driver the problem does not occur. > > > > When intel-eth-pci driver is dynamically unbound from the device > > .remove() > > intel_eth_pci_remove() > > stmmac_dvr_remove() > > pm_runtime_disable(); > > Why isn't the issue in intel-eth-pci? > > For example stmmac_dvr_remove() does indeed call pm_runtime_disable() > unconditionally, but stmmac_dvr_probe() only conditionally calls > pm_runtime_enable() with logic like proposed here for vfio-pci. Isn't > it this conditional enabling which causes an unbalanced disable depth > that's the core of the problem? > The common code in the stmmac driver is used for both PCI and non-PCI drivers and this code doen't handle this correctly. That condition is actually wrong > It doesn't seem like it should be the responsibility of the next driver > to correct the state from the previous driver. You've indicated that > the device works with vfio-pci if there's no previous driver, so > clearly intel-eth-pci isn't leaving the device in the same runtime pm > state that it found it. Thanks, yes, I agree. I was confused by a number of driver calling pm_runtime_disabe in their remove() function but those are not PCI drivers. Unfortunataly runtime PM documentation is not very clear on this topic. I'll submit another patch for the driver. Are there any subsystems other than PCI that call pm_runtime_enable/disable? Right now my patch for the driver do not call them only for PCI case. BR, Mikhail > > Alex > > > Later when QEMU tries to get the device file descriptor by calling > > VFIO_GROUP_GET_DEVICE_FD ioctl pm_runtime_resume_and_get returns -EACCES. > > It happens because dev->power.disable_depth == 1 . > > > > vfio_group_fops_unl_ioctl(VFIO_GROUP_GET_DEVICE_FD) > > vfio_group_ioctl_get_device_fd() > > vfio_device_open() > > ret = device->ops->open_device() > > vfio_pci_open_device() > > vfio_pci_core_enable() > > ret = pm_runtime_resume_and_get(); > > > > This behavior was introduced by > > commit 7ab5e10eda02 ("vfio/pci: Move the unused device into low power state with runtime PM") > > > > This may be the case for any driver calling pm_runtime_disable() in its > > .remove() callback. > > > > The case when a runtime PM may be disable for a device is not handled so we > > call pm_runtime_enable() in vfio_pci_core_register_device to re-enable it. > > > > Mikhail Malyshev (1): > > vfio/pci: Reenable runtime PM for dynamically unbound devices > > > > drivers/vfio/pci/vfio_pci_core.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > -- > > 2.34.1 > > >