On Fri, 8 Jul 2022 14:51:30 +0530 Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > On 7/6/2022 9:09 PM, Alex Williamson wrote: > > On Fri, 1 Jul 2022 16:38:09 +0530 > > Abhishek Sahu <abhsahu@xxxxxxxxxx> wrote: > > > >> This patch adds INTx handling during runtime suspend/resume. > >> All the suspend/resume related code for the user to put the device > >> into the low power state will be added in subsequent patches. > >> > >> The INTx are shared among devices. Whenever any INTx interrupt comes > > > > "The INTx lines may be shared..." > > > >> for the VFIO devices, then vfio_intx_handler() will be called for each > >> device. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() > > > > "...device sharing the interrupt." > > > >> and checks if the interrupt has been generated for the current device. > >> Now, if the device is already in the D3cold state, then the config space > >> can not be read. Attempt to read config space in D3cold state can > >> cause system unresponsiveness in a few systems. To prevent this, mask > >> INTx in runtime suspend callback and unmask the same in runtime resume > >> callback. If INTx has been already masked, then no handling is needed > >> in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and > >> vfio_pci_intx_mask() has been updated to return true if INTx has been > >> masked inside this function. > >> > >> For the runtime suspend which is triggered for the no user of VFIO > >> device, the is_intx() will return false and these callbacks won't do > >> anything. > >> > >> The MSI/MSI-X are not shared so similar handling should not be > >> needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() > >> without doing any device-specific config access. When the user performs > >> any config access or IOCTL after receiving the eventfd notification, > >> then the device will be moved to the D0 state first before > >> servicing any request. > >> > >> Signed-off-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> > >> --- > >> drivers/vfio/pci/vfio_pci_core.c | 37 +++++++++++++++++++++++++++---- > >> drivers/vfio/pci/vfio_pci_intrs.c | 6 ++++- > >> include/linux/vfio_pci_core.h | 3 ++- > >> 3 files changed, 40 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > >> index a0d69ddaf90d..5948d930449b 100644 > >> --- a/drivers/vfio/pci/vfio_pci_core.c > >> +++ b/drivers/vfio/pci/vfio_pci_core.c > >> @@ -259,16 +259,45 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat > >> return ret; > >> } > >> > >> +#ifdef CONFIG_PM > >> +static int vfio_pci_core_runtime_suspend(struct device *dev) > >> +{ > >> + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); > >> + > >> + /* > >> + * If INTx is enabled, then mask INTx before going into the runtime > >> + * suspended state and unmask the same in the runtime resume. > >> + * If INTx has already been masked by the user, then > >> + * vfio_pci_intx_mask() will return false and in that case, INTx > >> + * should not be unmasked in the runtime resume. > >> + */ > >> + vdev->pm_intx_masked = (is_intx(vdev) && vfio_pci_intx_mask(vdev)); > >> + > >> + return 0; > >> +} > >> + > >> +static int vfio_pci_core_runtime_resume(struct device *dev) > >> +{ > >> + struct vfio_pci_core_device *vdev = dev_get_drvdata(dev); > >> + > >> + if (vdev->pm_intx_masked) > >> + vfio_pci_intx_unmask(vdev); > >> + > >> + return 0; > >> +} > >> +#endif /* CONFIG_PM */ > >> + > >> /* > >> - * The dev_pm_ops needs to be provided to make pci-driver runtime PM working, > >> - * so use structure without any callbacks. > >> - * > >> * The pci-driver core runtime PM routines always save the device state > >> * before going into suspended state. If the device is going into low power > >> * state with only with runtime PM ops, then no explicit handling is needed > >> * for the devices which have NoSoftRst-. > >> */ > >> -static const struct dev_pm_ops vfio_pci_core_pm_ops = { }; > >> +static const struct dev_pm_ops vfio_pci_core_pm_ops = { > >> + SET_RUNTIME_PM_OPS(vfio_pci_core_runtime_suspend, > >> + vfio_pci_core_runtime_resume, > >> + NULL) > >> +}; > >> > >> int vfio_pci_core_enable(struct vfio_pci_core_device *vdev) > >> { > >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > >> index 6069a11fb51a..1a37db99df48 100644 > >> --- a/drivers/vfio/pci/vfio_pci_intrs.c > >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c > >> @@ -33,10 +33,12 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused) > >> eventfd_signal(vdev->ctx[0].trigger, 1); > >> } > >> > >> -void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) > >> +/* Returns true if INTx has been masked by this function. */ > >> +bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) > >> { > >> struct pci_dev *pdev = vdev->pdev; > >> unsigned long flags; > >> + bool intx_masked = false; > >> > >> spin_lock_irqsave(&vdev->irqlock, flags); > >> > >> @@ -60,9 +62,11 @@ void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev) > >> disable_irq_nosync(pdev->irq); > >> > >> vdev->ctx[0].masked = true; > >> + intx_masked = true; > >> } > >> > >> spin_unlock_irqrestore(&vdev->irqlock, flags); > >> + return intx_masked; > >> } > > > > > > There's certainly another path through this function that masks the > > interrupt, which makes the definition of this return value a bit > > confusing. > > For our case we should not hit that path. But we can return the > intx_masked true from that path as well to align return value. > > > Wouldn't it be simpler not to overload the masked flag on > > the interrupt context like this and instead set a new flag on the vdev > > under irqlock to indicate the device is unable to generate interrupts. > > The irq handler would add a test of this flag before any tests that > > would access the device. Thanks, > > > > Alex > > > > We will set this flag inside runtime_suspend callback but the > device can be in non-D3cold state (For example, if user has > disabled d3cold explicitly by sysfs, the D3cold is not supported in > the platform, etc.). Also, in D3cold supported case, the device will > be in D0 till the PCI core moves the device into D3cold. In this case, > there is possibility that the device can generate an interrupt. > If we add check in the IRQ handler, then we won’t check and clear > the IRQ status, but the interrupt line will still be asserted > which can cause interrupt flooding. > > This was the reason for disabling interrupt itself instead of > checking flag in the IRQ handler. Ok, maybe this is largely a clarification of the return value of vfio_pci_intx_mask(). I think what you're looking for is whether the context mask was changed, rather than whether the interrupt is masked, which I think avoids the confusion regarding whether the first branch should return true or false. So the variable should be something like "masked_changed" and the comment changed to "Returns true if the INTx vfio_pci_irq_ctx.masked value is changed". Testing is_intx() outside of the irqlock is potentially racy, so do we need to add the pm-get/put wrappers on ioctls first to avoid the possibility that pm-suspend can race a SET_IRQS ioctl? Thanks, Alex