> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Saturday, May 8, 2021 1:06 AM > > > > Those are the main ones I can think of. It is nice to have a simple > > > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't > > > raise the barrier to entry too high, but the user needs to have the > > > ability to have more control of their mappings and locked page > > > accounting should probably be offloaded somewhere. Thanks, > > > > > > > Based on your feedbacks I feel it's probably reasonable to start with > > a type1v2 semantics for the new interface. Locked accounting could > > also start with the same VFIO restriction and then improve it > > incrementally, if a cleaner way is intrusive (if not affecting uAPI). > > But I didn't get the suggestion on "more control of their mappings". > > Can you elaborate? > > Things like I note above, userspace cannot currently specify mapping > granularity nor has any visibility to the granularity they get from the > IOMMU. What actually happens in the IOMMU is pretty opaque to the user > currently. Thanks, > It's much clearer. Based on all the discussions so far I'm thinking about a staging approach when building the new interface, basically following the model that Jason pointed out - generic stuff first, then platform specific extension: Phase 1: /dev/ioasid with core ingredients and vfio type1v2 semantics - ioasid is the software handle representing an I/O page table - uAPI accepts a type1v2 map/unmap semantics per ioasid - helpers for VFIO/VDPA to bind ioasid_fd and attach ioasids - multiple ioasids are allowed without nesting (vIOMMU, or devices w/ incompatible iommu attributes) - an ioasid disallows any operation before it's attached to a device - an ioasid inherits iommu attributes from the 1st device attached to it - userspace is expected to manage hardware restrictions and the kernel only returns error when restrictions are broken * map/unmap on an ioasid will fail before every device in a group is attached to it * ioasid attach will fail if the new device has incompatibile iommu attribute as that of this ioasid - thus no group semantics in uAPI - no change to vfio container/group/type1 logic, for running existing vfio applications * imply some duplication between vfio type1 and ioasid for some time - new uAPI in vfio to allow explicit opening of a device and then binding it to the ioasid_fd * possibly require each device exposed in /dev/vfio/ - support both pdev and mdev Phase 2: ioasid nesting - Allow bind/unbind_pgtable semantics per ioasid - Allow ioasid nesting * HW ioasid nesting if supported by platform * otherwise fall back to SW ioasid nesting (in-kernel shadowing) - iotlb invalidation per ioasid - I/O page fault handling per ioasid - hw_id is not exposed in uAPI. Vendor IOMMU driver decides when/how hw_id is allocated and programmed properly Phase3: optimizations and vendor extensions (order undefined, up to the specific feature owner): - (Intel) ENQCMD support with hw_id exposure in uAPI - (ARM/AMD) RID-based pasid table assignment - (PPC) window-based iova management - Optimizations: * replace vfio type1 with a shim driver to use ioasid backend * mapping granularity * HW dirty page tracking * ... Does above sounds a sensible plan? If yes we'll start working on phase1 then... Thanks Kevin