RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, October 14, 2021 11:43 PM
> 
> > > > I think the key is whether other archs allow driver to decide DMA
> > > > coherency and indirectly the underlying I/O page table format.
> > > > If yes, then I don't see a reason why such decision should not be
> > > > given to userspace for passthrough case.
> > >
> > > The choice all comes down to if the other arches have cache
> > > maintenance instructions in the VM that *don't work*
> >
> > Looks vfio always sets IOMMU_CACHE on all platforms as long as
> > iommu supports it (true on all platforms except intel iommu which
> > is dedicated for GPU):
> >
> > vfio_iommu_type1_attach_group()
> > {
> > 	...
> > 	if (iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> > 		domain->prot |= IOMMU_CACHE;
> > 	...
> > }
> >
> > Should above be set according to whether a device is coherent?
> 
> For IOMMU_CACHE there are two questions related to the overloaded
> meaning:
> 
>  - Should VFIO ask the IOMMU to use non-coherent DMA (ARM meaning)
>    This depends on how the VFIO user expects to operate the DMA.
>    If the VFIO user can issue cache maintenance ops then IOMMU_CACHE
>    should be controlled by the user. I have no idea what platforms
>    support user space cache maintenance ops.

But just like you said for intel meaning below, even if those ops are
privileged a uAPI can be provided to support such usage if necessary.

> 
>  - Should VFIO ask the IOMMU to suppress no-snoop (Intel meaning)
>    This depends if the VFIO user has access to wbinvd or not.
> 
>    wbinvd is a privileged instruction so normally userspace will not
>    be able to access it.
> 
>    Per Paolo recommendation there should be a uAPI someplace that
>    allows userspace to issue wbinvd - basically the suppress no-snoop
>    is also user controllable.
> 
> The two things are very similar and ultimately are a choice userspace
> should be making.

yes

> 
> From something like a qemu perspective things are more murkey - eg on
> ARM qemu needs to co-ordinate with the guest. Whatever IOMMU_CACHE
> mode VFIO is using must match the device coherent flag in the Linux
> guest. I'm guessing all Linux guest VMs only use coherent DMA for all
> devices today. I don't know if the cache maintaince ops are even
> permitted in an ARM VM.
> 

I'll leave it to Jean to confirm. If only coherent DMA can be used in
the guest on other platforms, suppose VFIO should not blindly set 
IOMMU_CACHE and in concept it should deny assigning a non-coherent 
device since no co-ordination with guest exists today.

So the bottomline is that we'll keep this no-snoop thing Intel-specific. 
For the basic skeleton we'll not support no-snoop thus the user 
needs to set enforce-snoop flag when creating an IOAS like this RFC v1
does. Also need to introduce a new flag instead of abusing 
IOMMU_CACHE in the kernel. For other platforms it may need a fix 
to deny non-coherent device (based on above open) for now.

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux