RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Sat, 8 May 2021 07:31:18 +0000

> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Saturday, May 8, 2021 1:06 AM
> 
> > > Those are the main ones I can think of.  It is nice to have a simple
> > > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
> > > raise the barrier to entry too high, but the user needs to have the
> > > ability to have more control of their mappings and locked page
> > > accounting should probably be offloaded somewhere.  Thanks,
> > >
> >
> > Based on your feedbacks I feel it's probably reasonable to start with
> > a type1v2 semantics for the new interface. Locked accounting could
> > also start with the same VFIO restriction and then improve it
> > incrementally, if a cleaner way is intrusive (if not affecting uAPI).
> > But I didn't get the suggestion on "more control of their mappings".
> > Can you elaborate?
> 
> Things like I note above, userspace cannot currently specify mapping
> granularity nor has any visibility to the granularity they get from the
> IOMMU.  What actually happens in the IOMMU is pretty opaque to the user
> currently.  Thanks,
> 

It's much clearer. Based on all the discussions so far I'm thinking about
a staging approach when building the new interface, basically following
the model that Jason pointed out - generic stuff first, then platform 
specific extension:

Phase 1: /dev/ioasid with core ingredients and vfio type1v2 semantics
    - ioasid is the software handle representing an I/O page table
    - uAPI accepts a type1v2 map/unmap semantics per ioasid
    - helpers for VFIO/VDPA to bind ioasid_fd and attach ioasids
    - multiple ioasids are allowed without nesting (vIOMMU, or devices
w/ incompatible iommu attributes)
    - an ioasid disallows any operation before it's attached to a device
    - an ioasid inherits iommu attributes from the 1st device attached
to it
    - userspace is expected to manage hardware restrictions and the
kernel only returns error when restrictions are broken
        * map/unmap on an ioasid will fail before every device in a group 
is attached to it
        * ioasid attach will fail if the new device has incompatibile iommu
attribute as that of this ioasid
    - thus no group semantics in uAPI
    - no change to vfio container/group/type1 logic, for running existing
vfio applications
        * imply some duplication between vfio type1 and ioasid for some time
    - new uAPI in vfio to allow explicit opening of a device and then binding
it to the ioasid_fd
        * possibly require each device exposed in /dev/vfio/
    - support both pdev and mdev

Phase 2: ioasid nesting
    - Allow bind/unbind_pgtable semantics per ioasid
    - Allow ioasid nesting 
        * HW ioasid nesting if supported by platform
        * otherwise fall back to SW ioasid nesting (in-kernel shadowing)
    - iotlb invalidation per ioasid
    - I/O page fault handling per ioasid
    - hw_id is not exposed in uAPI. Vendor IOMMU driver decides
when/how hw_id is allocated and programmed properly

Phase3: optimizations and vendor extensions (order undefined, up to
the specific feature owner):
    - (Intel) ENQCMD support with hw_id exposure in uAPI
    - (ARM/AMD) RID-based pasid table assignment
    - (PPC) window-based iova management
    - Optimizations:
        * replace vfio type1 with a shim driver to use ioasid backend
        * mapping granularity
        * HW dirty page tracking
        * ...

Does above sounds a sensible plan? If yes we'll start working on 
phase1 then...

Thanks
Kevin