RE: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Thursday, December 9, 2021 2:31 AM
> 
> On Wed, Dec 08, 2021 at 05:20:39PM +0000, Jean-Philippe Brucker wrote:
> > On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
> > > From a progress perspective I would like to start with simple 'page
> > > tables in userspace', ie no PASID in this step.
> > >
> > > 'page tables in userspace' means an iommufd ioctl to create an
> > > iommu_domain where the IOMMU HW is directly travesering a
> > > device-specific page table structure in user space memory. All the HW
> > > today implements this by using another iommu_domain to allow the
> IOMMU
> > > HW DMA access to user memory - ie nesting or multi-stage or whatever.
> > >
> > > This would come along with some ioctls to invalidate the IOTLB.
> > >
> > > I'm imagining this step as a iommu_group->op->create_user_domain()
> > > driver callback which will create a new kind of domain with
> > > domain-unique ops. Ie map/unmap related should all be NULL as those
> > > are impossible operations.
> > >
> > > From there the usual struct device (ie RID) attach/detatch stuff needs
> > > to take care of routing DMAs to this iommu_domain.
> > >
> > > Step two would be to add the ability for an iommufd using driver to
> > > request that a RID&PASID is connected to an iommu_domain. This
> > > connection can be requested for any kind of iommu_domain, kernel
> owned
> > > or user owned.
> > >
> > > I don't quite have an answer how exactly the SMMUv3 vs Intel
> > > difference in PASID routing should be resolved.
> >
> > In SMMUv3 the user pgd is always stored in the PASID table (actually
> > called "context descriptor table" but I want to avoid confusion with
> > the VT-d "context table"). And to access the PASID table, the SMMUv3 first
> > translate its GPA into a PA using the stage-2 page table. For userspace to
> > pass individual pgds to the kernel, as opposed to passing whole PASID
> > tables, the host kernel needs to reserve GPA space and map it in stage-2,
> > so it can store the PASID table in there. Userspace manages GPA space.
> 
> It is what I thought.. So in the SMMUv3 spec the STE is completely in
> kernel memory, but it points to an S1ContextPtr that must be an IPA if
> the "stage 1 translation tables" are IPA. Only via S1ContextPtr can we
> decode the substream?
> 
> So in SMMUv3 land we don't really ever talk about PASID, we have a
> 'user page table' that is bound to an entire RID and *all* PASIDs.
> 
> While Intel would have a 'user page table' that is only bound to a RID
> & PASID
> 
> Certianly it is not a difference we can hide from userspace.

Concept-wise it is still a 'user page table' with vendor specific format.

Taking your earlier analog it's just for a single 84-bit address space
(20PASID+64bitVA) per RID.

So what we requires is still one unified ioctl in your step-1 to support
per-RID 'user page table'.

For ARM it's SMMU's PASID table format. There is no step-2 since PASID
is already within the address space covered by the user PASID table.

For Intel it's VT-d's 1st level page table format. When moving to step-2
then allows multiple 'user page tables' connected to RID & PASID.

> 
> > This would be easy for a single pgd. In this case the PASID table has a
> > single entry and userspace could just pass one GPA page during
> > registration. However it isn't easily generalized to full PASID support,
> > because managing a multi-level PASID table will require runtime GPA
> > allocation, and that API is awkward. That's why we opted for "attach PASID
> > table" operation rather than "attach page table" (back then the choice was
> > easy since VT-d used the same concept).
> 
> I think the entire context descriptor table should be in userspace,
> and filled in by userspace, as part of the userspace page table.
> 
> The kernel API should accept the S1ContextPtr IPA and all the parts of
> the STE that relate to the defining the layout of what the S1Context
> points to an thats it.
> 
> We should have another mode where the kernel owns everything, and the
> S1ContexPtr is a PA with Stage 2 bypassed.

I guess this is for the usage like DPDK. In that case yes we can have
unified ioctl since the kernel manages everything including the PASID
table. 

> 
> That part is fine, the more open question is what does the driver
> interface look like when userspace tell something like vfio-pci to
> connect to this thing. At some level the attaching device needs to
> authorize iommufd to take the entire PASID table and RID.

as long as smmu driver advocates only supporting step-1 ioctl,
then this authorization should be implied already.

> 
> Specifically we cannot use this thing with a mdev, while the Intel
> version of a userspace page table can be.

yes. Supporting mdev is all the reason why Intel puts the PASID
table in host physical address space to be managed by the kernel.

> 
> Maybe that is just some 'allow whole device' flag in an API
> 

As said, I feel this special flag is not required as long as the 
vendor iommu driver only supports your step-1 interface which
implies 'allow whole device' for ARM.

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux