At the moment VFIO does not allow mapping MSIX BAR to the userspace and provides information about what can be mapped via a PCI region capability. This ensures that the userspace does not get access to MSIX vector table and cannot break MSIX setup. For most platforms the userspace also chooses not to map MSIX as it has to emulate it for fully virtualized guests. This creates a problem when an MSIX BAR is not aligned to a system page size and in order to prevent mapping, chunks before and/or after the MSIX BAR can't also be mapped to the userspace. If these chunks happen to be used very often (for example, Emulex LightPulse FC), we see half of expected performance. This enables mapping of MSIX BARs to the userspace by advertising these BARs as a single chunk (as any other BAR). The userspace still is expected not to mmap an MSIX BAR if it want to emulate it (QEMU still has this code). This does not break protection as at the moment for the Type1 IOMMU, vfio_iommu_type1_attach_group() hook fails if MSIX remapping is not enabled and this ensures that even ill-behaved device cannot harm the host. For sPAPR IOMMU, the hardware always provides such isolation by default. The only case when such mapping protection is rather needed is allow_unsafe_interrupts but as its name suggests, it is not safe anyway. Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> --- This is to replace 6 patches from Yongji "[RESEND PATCH v2 0/6] vfio-pci: Add support for mmapping MSI-X table" Previously we tried to add a MSI_REMAP flag to a PCI bus and set it via various means for various platforms. It relied on MSI_FLAG_IRQ_REMAPPING from msi_domain_info which never made it to the upstream. Instead Type1 IOMMU learned to check remapping capabilities and refuse to work and it seems like a good idea to use this. How much do we care about allow_unsafe_interrupts? If we care enough, I'll need a way to pass the msi_remap flag from vfio_iommu_type1_attach_group() down to vfio_pci_bar_rw/vfio_pci_mmap, it does not seem easy to make it nice. What else did I miss? Please comment. Thanks. --- drivers/vfio/pci/vfio_pci.c | 41 ++++------------------------------------ drivers/vfio/pci/vfio_pci_rdwr.c | 5 ----- 2 files changed, 4 insertions(+), 42 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 324c52e3a1a4..222bc07b9e41 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -559,15 +559,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev, struct vfio_info_cap *caps) { struct vfio_region_info_cap_sparse_mmap *sparse; - size_t end, size; - int nr_areas = 2, i = 0, ret; - - end = pci_resource_len(vdev->pdev, vdev->msix_bar); - - /* If MSI-X table is aligned to the start or end, only one area */ - if (((vdev->msix_offset & PAGE_MASK) == 0) || - (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end)) - nr_areas = 1; + size_t size; + int nr_areas = 1, i = 0, ret; size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); @@ -577,18 +570,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev, sparse->nr_areas = nr_areas; - if (vdev->msix_offset & PAGE_MASK) { - sparse->areas[i].offset = 0; - sparse->areas[i].size = vdev->msix_offset & PAGE_MASK; - i++; - } - - if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) { - sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset + - vdev->msix_size); - sparse->areas[i].size = end - sparse->areas[i].offset; - i++; - } + sparse->areas[i].offset = 0; + sparse->areas[i].size = pci_resource_len(vdev->pdev, vdev->msix_bar); ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP, sparse); @@ -1115,22 +1098,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { - /* - * Disallow mmaps overlapping the MSI-X table; users don't - * get to touch this directly. We could find somewhere - * else to map the overlap, but page granularity is only - * a recommendation, not a requirement, so the user needs - * to know which bits are real. Requiring them to mmap - * around the table makes that clear. - */ - - /* If neither entirely above nor below, then it overlaps */ - if (!(req_start >= vdev->msix_offset + vdev->msix_size || - req_start + req_len <= vdev->msix_offset)) - return -EINVAL; - } - /* * Even though we don't make use of the barmap for the mmap, * we need to request the region and the barmap tracks that. diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 357243d76f10..916b72c5c4fc 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -164,11 +164,6 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf, } else io = vdev->barmap[bar]; - if (bar == vdev->msix_bar) { - x_start = vdev->msix_offset; - x_end = vdev->msix_offset + vdev->msix_size; - } - done = do_io_rw(io, buf, pos, count, x_start, x_end, iswrite); if (done >= 0) -- 2.11.0