RE: [RFC v2] /dev/iommu uAPI proposal

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Fri, 16 Jul 2021 01:20:15 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Friday, July 16, 2021 2:13 AM
> 
> On Thu, Jul 15, 2021 at 11:05:45AM -0700, Raj, Ashok wrote:
> > On Thu, Jul 15, 2021 at 02:53:36PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Jul 15, 2021 at 10:48:36AM -0700, Raj, Ashok wrote:
> > >
> > > > > > Do we have any isolation requirements here? its the same process.
> So if the
> > > > > > page-request it sent to guest and even if you report it for mdev1,
> after
> > > > > > the PRQ is resolved by guest, the request from mdev2 from the
> same guest
> > > > > > should simply work?
> > > > >
> > > > > I think we already talked about this and said it should not be done.
> > > >
> > > > I get the should not be done, I'm wondering where should that be
> > > > implemented?
> > >
> > > The iommu layer cannot have ambiguity. Every RID or RID,PASID slot
> > > must have only one device attached to it. Attempting to connect two
> > > devices to the same slot fails on the iommu layer.
> >
> > I guess we are talking about two different things. I was referring to SVM
> > side of things. Maybe you are referring to the mdev.
> 
> I'm talking about in the hypervisor.
> 
> As I've said already, the vIOMMU interface is the problem here. The
> guest VM should be able to know that it cannot use PASID 1 with two
> devices, like the hypervisor knows. At the very least it should be
> able to know that the PASID binding has failed and relay that failure
> back to the process.
> 
> Ideally the guest would know it should allocate another PASID for
> these cases.
> 
> But yes, if mdevs are going to be modeled with RIDs in the guest then
> with the current vIOMMU we cannot cause a single hypervisor RID to
> show up as two RIDs in the guest without breaking the vIOMMU model.
> 

To summarize, for vIOMMU we can work with the spec owner to 
define a proper interface to feedback such restriction into the guest 
if necessary. For the kernel part, it's clear that IOMMU fd should 
disallow two devices attached to a single [RID] or [RID, PASID] slot 
in the first place.

Then the next question is how to communicate such restriction
to the userspace. It sounds like a group, but different in concept.
An iommu group describes the minimal isolation boundary thus all
devices in the group can be only assigned to a single user. But this
case is opposite - the two mdevs (both support ENQCMD submission)
with the same parent have problem when assigned to a single VM 
(in this case vPASID is vm-wide translated thus a same pPASID will be 
used cross both mdevs) while they instead work pretty well when 
assigned to different VMs (completely different vPASID spaces thus 
different pPASIDs).

One thought is to have vfio device driver deal with it. In this proposal
it is the vfio device driver to define the PASID virtualization policy and
report it to userspace via VFIO_DEVICE_GET_INFO. The driver understands
the restriction thus could just hide the vPASID capability when the user 
calls GET_INFO on the 2nd mdev in above scenario. In this way the 
user even doesn't need to know such restriction at all and both mdevs
can be assigned to a single VM w/o any problem.

Does it sound a right approach?

Thanks
Kevin