RE: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Wednesday, May 11, 2022 3:00 AM
> 
> On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > Ok... here's a revised version of my proposal which I think addresses
> > your concerns and simplfies things.
> >
> > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY
> >   will probably need matching changes)
> >
> > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> >   is chosen by the kernel within the aperture(s).  This is closer to
> >   how mmap() operates, and DPDK and similar shouldn't care about
> >   having specific IOVAs, even at the individual mapping level.
> >
> > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED,
> >   for when you really do want to control the IOVA (qemu, maybe some
> >   special userspace driver cases)
> 
> We already did both of these, the flag is called
> IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> select the IOVA internally.
> 
> > - ATTACH will fail if the new device would shrink the aperture to
> >   exclude any already established mappings (I assume this is already
> >   the case)
> 
> Yes
> 
> > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-FIXED)
> >   MAPs won't use it, but doesn't actually put anything into the IO
> >   pagetables.
> >     - Like a regular mapping, ATTACHes that are incompatible with an
> >       IOMAP_RESERVEed region will fail
> >     - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED
> >       mapping
> 
> Yeah, this seems OK, I'm thinking a new API might make sense because
> you don't really want mmap replacement semantics but a permanent
> record of what IOVA must always be valid.
> 
> IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> IOMMUFD_CMD_IOAS_IOVA_RANGES:
> 
> struct iommu_ioas_require_iova {
>         __u32 size;
>         __u32 ioas_id;
>         __u32 num_iovas;
>         __u32 __reserved;
>         struct iommu_required_iovas {
>                 __aligned_u64 start;
>                 __aligned_u64 last;
>         } required_iovas[];
> };

As a permanent record do we want to enforce that once the required
range list is set all FIXED and non-FIXED allocations must be within the
list of ranges?

If yes we can take the end of the last range as the max size of the iova
address space to optimize the page table layout.

otherwise we may need another dedicated hint for that optimization.

> 
> > So, for DPDK the sequence would be:
> >
> > 1. Create IOAS
> > 2. ATTACH devices
> > 3. IOAS_MAP some stuff
> > 4. Do DMA with the IOVAs that IOAS_MAP returned
> >
> > (Note, not even any need for QUERY in simple cases)
> 
> Yes, this is done already
> 
> > For (unoptimized) qemu it would be:
> >
> > 1. Create IOAS
> > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> the
> >    guest platform
> > 3. ATTACH devices (this will fail if they're not compatible with the
> >    reserved IOVA regions)
> > 4. Boot the guest

I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
vIOMMUs regular mappings are required before booting the guest and
reservation might be done but not mandatory (at least not what current
Qemu vfio can afford as it simply replays valid ranges in the CPU address
space).

> >
> >   (on guest map/invalidate) -> IOAS_MAP(IOMAP_FIXED) to overmap part
> of
> >                                the reserved regions
> >   (on dev hotplug) -> ATTACH (which might fail, if it conflicts with the
> >                       reserved regions)
> >   (on vIOMMU reconfiguration) -> UNMAP/MAP reserved regions as
> >                                  necessary (which might fail)
> 
> OK, I will take care of it
> 
> Thanks,
> Jason




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux