On Fri, 12 Apr 2024 01:21:21 -0700 Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > Today, vfio-pci hides the PASID capability of devices from userspace. Unlike > other PCI capabilities, PASID capability is going to be reported to user by > VFIO_DEVICE_FEATURE. Hence userspace could probe PASID capability by it. > This is a bit different from the other capabilities which are reported to > userspace when the user reads the device's PCI configuration space. There > are two reasons for this. > > - First, userspace like Qemu by default exposes all the available PCI > capabilities in vfio-pci config space to the guest as read-only, so > adding PASID capability in the vfio-pci config space will make it > exposed to the guest automatically while an old Qemu doesn't really > support it. > > - Second, the PASID capability does not exist on VFs (instead shares the > cap of the PF). Creating a virtual PASID capability in vfio-pci config > space needs to find a hole to place it, but doing so may require device > specific knowledge to avoid potential conflict with device specific > registers like hidden bits in VF's config space. It's simpler to move > this burden to the VMM instead of maintaining a quirk system in the kernel. > > Suggested-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > Signed-off-by: Yi Liu <yi.l.liu@xxxxxxxxx> > --- > drivers/vfio/pci/vfio_pci_core.c | 50 ++++++++++++++++++++++++++++++++ > include/uapi/linux/vfio.h | 14 +++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index d94d61b92c1a..ca64e461d435 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1495,6 +1495,54 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, > return 0; > } > > +static int vfio_pci_core_feature_pasid(struct vfio_device *device, u32 flags, > + struct vfio_device_feature_pasid __user *arg, > + size_t argsz) > +{ > + struct vfio_pci_core_device *vdev = > + container_of(device, struct vfio_pci_core_device, vdev); > + struct vfio_device_feature_pasid pasid = { 0 }; > + struct pci_dev *pdev = vdev->pdev; > + u32 capabilities = 0; > + u16 ctrl = 0; > + int ret; > + > + /* > + * Due to no PASID capability per VF, to be consistent, we do not > + * support SET of the PASID capability for both PF and VF. > + */ > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, > + sizeof(pasid)); > + if (ret != 1) > + return ret; > + > + /* VF shares the PASID capability of its PF */ > + if (pdev->is_virtfn) > + pdev = pci_physfn(pdev); > + > + if (!pdev->pasid_enabled) > + goto out; > + > +#ifdef CONFIG_PCI_PASID > + pci_read_config_dword(pdev, pdev->pasid_cap + PCI_PASID_CAP, > + &capabilities); > + pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, > + &ctrl); > +#endif > + > + pasid.width = (capabilities >> 8) & 0x1f; > + > + if (ctrl & PCI_PASID_CTRL_EXEC) > + pasid.capabilities |= VFIO_DEVICE_PASID_CAP_EXEC; > + if (ctrl & PCI_PASID_CTRL_PRIV) > + pasid.capabilities |= VFIO_DEVICE_PASID_CAP_PRIV; I agree with Kevin here, let's make use of and add helpers to avoid #ifdef blocks of code. > + > +out: > + if (copy_to_user(arg, &pasid, sizeof(pasid))) > + return -EFAULT; > + return 0; > +} > + > int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > void __user *arg, size_t argsz) > { > @@ -1508,6 +1556,8 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, > return vfio_pci_core_pm_exit(device, flags, arg, argsz); > case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: > return vfio_pci_core_feature_token(device, flags, arg, argsz); > + case VFIO_DEVICE_FEATURE_PASID: > + return vfio_pci_core_feature_pasid(device, flags, arg, argsz); > default: > return -ENOTTY; > } > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 9591dc24b75c..e50e55c67ab4 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -1513,6 +1513,20 @@ struct vfio_device_feature_bus_master { > }; > #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 > > +/** > + * Upon VFIO_DEVICE_FEATURE_GET, return the PASID capability for the device. > + * Zero width means no support for PASID. Why would we do that rather than reporting the feature as unsupported? Just return -ENOTTY if PASID is not supported or enabled. > + */ > +struct vfio_device_feature_pasid { > + __u16 capabilities; > +#define VFIO_DEVICE_PASID_CAP_EXEC (1 << 0) > +#define VFIO_DEVICE_PASID_CAP_PRIV (1 << 1) > + __u8 width; > + __u8 __reserved; > +}; Building on Kevin's comment on the cover letter, if we could describe an offset for emulating a PASID capability, this seems like the place we'd do it. I think we're not doing that because we'd like an in-band mechanism for a device to report unused config space, such as a DVSEC capability, so that it can be implemented on a physical device. As noted in the commit log here, we'd also prefer not to bloat the kernel with more device quirks. In an ideal world we might be able to jump start support of that DVSEC option by emulating the DVSEC capability on top of the PASID capability for PFs, but unfortunately the PASID capability is 8 bytes while the DVSEC capability is at least 12 bytes, so we can't implement that generically either. I don't know there's any good solution here or whether there's actually any value to the PASID capability on a PF, but do we need to consider leaving a field+flag here to describe the offset for that scenario? Would we then allow variant drivers to take advantage of it? Does this then turn into the quirk that we're trying to avoid in the kernel rather than userspace and is that a problem? Thanks, Alex > + > +#define VFIO_DEVICE_FEATURE_PASID 11 > + > /* -------- API for Type1 VFIO IOMMU -------- */ > > /**