On Wed, Mar 18, 2020 at 05:13:59PM +0000, Robin Murphy wrote: > On 2020-03-18 4:14 pm, Auger Eric wrote: > > Hi, > > > > On 3/18/20 1:00 PM, Robin Murphy wrote: > > > On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote: > > > > We don't currently support IOMMUs with a page granule larger than the > > > > system page size. The IOVA allocator has a BUG_ON() in this case, and > > > > VFIO has a WARN_ON(). > > > > Adding Alex in CC in case he has time to jump in. At the moment I don't > > get why this WARN_ON() is here. > > > > This was introduced in > > c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow > > > > > > > > > > It might be possible to remove these obstacles if necessary. If the host > > > > uses 64kB pages and the guest uses 4kB, then a device driver calling > > > > alloc_page() followed by dma_map_page() will create a 64kB mapping for a > > > > 4kB physical page, allowing the endpoint to access the neighbouring 60kB > > > > of memory. This problem could be worked around with bounce buffers. > > > > > > FWIW the fundamental issue is that callers of iommu_map() may expect to > > > be able to map two or more page-aligned regions directly adjacent to > > > each other for scatter-gather purposes (or ring buffer tricks), and > > > that's just not possible if the IOMMU granule is too big. Bounce > > > buffering would be a viable workaround for the streaming DMA API and > > > certain similar use-cases, but not in general (e.g. coherent DMA, VFIO, > > > GPUs, etc.) > > > > > > Robin. > > > > > > > For the moment, rather than triggering the IOVA BUG_ON() on mismatched > > > > page sizes, abort the virtio-iommu probe with an error message. > > > > I understand this is a introduced as a temporary solution but this > > sounds as an important limitation to me. For instance this will prevent > > from running a fedora guest exposed with a virtio-iommu with a RHEL host. > > As above, even if you bypassed all the warnings it wouldn't really work > properly anyway. In all cases that wouldn't be considered broken, the > underlying hardware IOMMUs should support the same set of granules as the > CPUs (or at least the smallest one), so is it actually appropriate for RHEL > to (presumably) expose a 64K granule in the first place, rather than "works > with anything" 4K? And/or more generally is there perhaps a hole in the > virtio-iommu spec WRT being able to negotiate page_size_mask for a > particular granule if multiple options are available? That could be added (by allowing config write). The larger problems are: 1) Supporting granularity smaller than the host's PAGE_SIZE in host VFIO. At the moment it restricts the exposed page mask and rejects DMA_MAP requests not aligned on that granularity. 2) Propagating this negotiation all the way to io-pgtable and the SMMU driver in the host. Thanks, Jean _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization