Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

Robin Murphy <robin.murphy@xxxxxxx> · Wed, 18 Mar 2020 17:13:59 +0000

On 2020-03-18 4:14 pm, Auger Eric wrote:
Hi,

On 3/18/20 1:00 PM, Robin Murphy wrote:
On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
We don't currently support IOMMUs with a page granule larger than the
system page size. The IOVA allocator has a BUG_ON() in this case, and
VFIO has a WARN_ON().

Adding Alex in CC in case he has time to jump in. At the moment I don't
get why this WARN_ON() is here.

This was introduced in
c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow


It might be possible to remove these obstacles if necessary. If the host
uses 64kB pages and the guest uses 4kB, then a device driver calling
alloc_page() followed by dma_map_page() will create a 64kB mapping for a
4kB physical page, allowing the endpoint to access the neighbouring 60kB
of memory. This problem could be worked around with bounce buffers.

FWIW the fundamental issue is that callers of iommu_map() may expect to
be able to map two or more page-aligned regions directly adjacent to
each other for scatter-gather purposes (or ring buffer tricks), and
that's just not possible if the IOMMU granule is too big. Bounce
buffering would be a viable workaround for the streaming DMA API and
certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
GPUs, etc.)

Robin.

For the moment, rather than triggering the IOVA BUG_ON() on mismatched
page sizes, abort the virtio-iommu probe with an error message.

I understand this is a introduced as a temporary solution but this
sounds as an important limitation to me. For instance this will prevent
from running a fedora guest exposed with a virtio-iommu with a RHEL host.

As above, even if you bypassed all the warnings it wouldn't really work 
properly anyway. In all cases that wouldn't be considered broken, the 
underlying hardware IOMMUs should support the same set of granules as 
the CPUs (or at least the smallest one), so is it actually appropriate 
for RHEL to (presumably) expose a 64K granule in the first place, rather 
than "works with anything" 4K? And/or more generally is there perhaps a 
hole in the virtio-iommu spec WRT being able to negotiate page_size_mask 
for a particular granule if multiple options are available?

Robin.


Thanks

Eric

Reported-by: Bharat Bhushan <bbhushan2@xxxxxxxxxxx>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
---
   drivers/iommu/virtio-iommu.c | 9 +++++++++
   1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 6d4e3c2a2ddb..80d5d8f621ab 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
       struct device *parent_dev = vdev->dev.parent;
       struct viommu_dev *viommu = NULL;
       struct device *dev = &vdev->dev;
+    unsigned long viommu_page_size;
       u64 input_start = 0;
       u64 input_end = -1UL;
       int ret;
@@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
*vdev)
           goto err_free_vqs;
       }
   +    viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
+    if (viommu_page_size > PAGE_SIZE) {
+        dev_err(dev, "granule 0x%lx larger than system page size
0x%lx\n",
+            viommu_page_size, PAGE_SIZE);
+        ret = -EINVAL;
+        goto err_free_vqs;
+    }
+
       viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
VIRTIO_IOMMU_MAP_F_WRITE;
       viommu->last_domain = ~0U;
  


_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization