On 10.01.25 14:20, Jason Gunthorpe wrote:
Thanks for your reply, I knew CCing you would be very helpful :)
On Fri, Jan 10, 2025 at 09:26:02AM +0100, David Hildenbrand wrote:
One limitation (also discussed in the guest_memfd
meeting) is that VFIO expects the DMA mapping for
a specific IOVA to be mapped and unmapped with the
same granularity.
Not just same granularity, whatever you map you have to unmap in
whole. map/unmap must be perfectly paired by userspace.
Right, that's what virtio-mem ends up doing by mapping each memory block
(e.g., 2 MiB) separately that could be unmapped separately.
It adds "overhead", but at least you don't run into "no, you cannot
split this region because you would be out of memory/slots" or in the
past issues with concurrent ongoing DMA.
such as converting a small region within a larger
region. To prevent such invalid cases, all
operations are performed with 4K granularity. The
possible solutions we can think of are either to
enable VFIO to support partial unmap
Yes, you can do that, but it is aweful for performance everywhere
Absolutely.
In your commit I read:
"Implement the cut operation to be hitless, changes to the page table
during cutting must cause zero disruption to any ongoing DMA. This is
the expectation of the VFIO type 1 uAPI. Hitless requires HW support, it
is incompatible with HW requiring break-before-make."
So I guess that would mean that, depending on HW support, one could
avoid disabling large pages to still allow for atomic cuts / partial
unmaps that don't affect concurrent DMA.
What would be your suggestion here to avoid the "map each 4k page
individually so we can unmap it individually" ? I didn't completely
grasp that, sorry.
From "IIRC you can only trigger split using the VFIO type 1 legacy API.
We would need to formalize split as an IOMMUFD native ioctl.
Nobody should use this stuf through the legacy type 1 API!!!!"
I assume you mean that we can only avoid the 4k map/unmap if we add
proper support to IOMMUFD native ioctl, and not try making it fly
somehow with the legacy type 1 API?
--
Cheers,
David / dhildenb