[RFC PATCH kernel] vfio-pci: Allow mapping of MSI-X table to userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At the moment VFIO does not allow mapping MSIX BAR to the userspace
and provides information about what can be mapped via a PCI region
capability. This ensures that the userspace does not get access
to MSIX vector table and cannot break MSIX setup. For most platforms
the userspace also chooses not to map MSIX as it has to emulate it for
fully virtualized guests.

This creates a problem when an MSIX BAR is not aligned to a system
page size and in order to prevent mapping, chunks before and/or after
the MSIX BAR can't also be mapped to the userspace. If these chunks
happen to be used very often (for example, Emulex LightPulse FC),
we see half of expected performance.

This enables mapping of MSIX BARs to the userspace by advertising
these BARs as a single chunk (as any other BAR). The userspace
still is expected not to mmap an MSIX BAR if it want to emulate it
(QEMU still has this code).

This does not break protection as at the moment for the Type1 IOMMU,
vfio_iommu_type1_attach_group() hook fails if MSIX remapping
is not enabled and this ensures that even ill-behaved device cannot
harm the host. For sPAPR IOMMU, the hardware always provides such
isolation by default. The only case when such mapping protection
is rather needed is allow_unsafe_interrupts but as its name suggests,
it is not safe anyway.

Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>

---

This is to replace 6 patches from Yongji
"[RESEND PATCH v2 0/6] vfio-pci: Add support for mmapping MSI-X table"

Previously we tried to add a MSI_REMAP flag to a PCI bus and set it
via various means for various platforms. It relied on
MSI_FLAG_IRQ_REMAPPING from msi_domain_info which never made it to
the upstream. Instead Type1 IOMMU learned to check remapping
capabilities and refuse to work and it seems like a good idea to use
this.

How much do we care about allow_unsafe_interrupts? If we care enough,
I'll need a way to pass the msi_remap flag from
vfio_iommu_type1_attach_group() down to vfio_pci_bar_rw/vfio_pci_mmap,
it does not seem easy to make it nice.

What else did I miss?

Please comment. Thanks.
---
 drivers/vfio/pci/vfio_pci.c      | 41 ++++------------------------------------
 drivers/vfio/pci/vfio_pci_rdwr.c |  5 -----
 2 files changed, 4 insertions(+), 42 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..222bc07b9e41 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -559,15 +559,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 				struct vfio_info_cap *caps)
 {
 	struct vfio_region_info_cap_sparse_mmap *sparse;
-	size_t end, size;
-	int nr_areas = 2, i = 0, ret;
-
-	end = pci_resource_len(vdev->pdev, vdev->msix_bar);
-
-	/* If MSI-X table is aligned to the start or end, only one area */
-	if (((vdev->msix_offset & PAGE_MASK) == 0) ||
-	    (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
-		nr_areas = 1;
+	size_t size;
+	int nr_areas = 1, i = 0, ret;
 
 	size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
 
@@ -577,18 +570,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 
 	sparse->nr_areas = nr_areas;
 
-	if (vdev->msix_offset & PAGE_MASK) {
-		sparse->areas[i].offset = 0;
-		sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
-		i++;
-	}
-
-	if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) {
-		sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset +
-						     vdev->msix_size);
-		sparse->areas[i].size = end - sparse->areas[i].offset;
-		i++;
-	}
+	sparse->areas[i].offset = 0;
+	sparse->areas[i].size = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
 	ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
 				       sparse);
@@ -1115,22 +1098,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	if (req_start + req_len > phys_len)
 		return -EINVAL;
 
-	if (index == vdev->msix_bar) {
-		/*
-		 * Disallow mmaps overlapping the MSI-X table; users don't
-		 * get to touch this directly.  We could find somewhere
-		 * else to map the overlap, but page granularity is only
-		 * a recommendation, not a requirement, so the user needs
-		 * to know which bits are real.  Requiring them to mmap
-		 * around the table makes that clear.
-		 */
-
-		/* If neither entirely above nor below, then it overlaps */
-		if (!(req_start >= vdev->msix_offset + vdev->msix_size ||
-		      req_start + req_len <= vdev->msix_offset))
-			return -EINVAL;
-	}
-
 	/*
 	 * Even though we don't make use of the barmap for the mmap,
 	 * we need to request the region and the barmap tracks that.
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..916b72c5c4fc 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,11 +164,6 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 	} else
 		io = vdev->barmap[bar];
 
-	if (bar == vdev->msix_bar) {
-		x_start = vdev->msix_offset;
-		x_end = vdev->msix_offset + vdev->msix_size;
-	}
-
 	done = do_io_rw(io, buf, pos, count, x_start, x_end, iswrite);
 
 	if (done >= 0)
-- 
2.11.0




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux