Hi Yi, On 7/12/20 1:20 PM, Liu Yi L wrote: > IOMMUs that support nesting translation needs report the capability info s/needs/need to report > to userspace, e.g. the format of first level/stage paging structures. It gives information about requirements the userspace needs to implement plus other features characterizing the physical implementation. > > This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get > nesting info after setting DOMAIN_ATTR_NESTING. I guess you meant after selecting VFIO_TYPE1_NESTING_IOMMU? > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > Cc: Joerg Roedel <joro@xxxxxxxxxx> > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > --- > v4 -> v5: > *) address comments from Eric Auger. > > v3 -> v4: > *) split the SMMU driver changes to be a separate patch > *) move the @addr_width and @pasid_bits from vendor specific > part to generic part. > *) tweak the description for the @features field of struct > iommu_nesting_info. > *) add description on the @data[] field of struct iommu_nesting_info > > v2 -> v3: > *) remvoe cap/ecap_mask in iommu_nesting_info. > *) reuse DOMAIN_ATTR_NESTING to get nesting info. > *) return an empty iommu_nesting_info for SMMU drivers per Jean' > suggestion. > --- > include/uapi/linux/iommu.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 77 insertions(+) > > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > index 1afc661..d2a47c4 100644 > --- a/include/uapi/linux/iommu.h > +++ b/include/uapi/linux/iommu.h > @@ -332,4 +332,81 @@ struct iommu_gpasid_bind_data { > } vendor; > }; > > +/* > + * struct iommu_nesting_info - Information for nesting-capable IOMMU. > + * user space should check it before using > + * nesting capability. > + * > + * @size: size of the whole structure > + * @format: PASID table entry format, the same definition as struct > + * iommu_gpasid_bind_data @format. > + * @features: supported nesting features. > + * @flags: currently reserved for future extension. > + * @addr_width: The output addr width of first level/stage translation > + * @pasid_bits: Maximum supported PASID bits, 0 represents no PASID > + * support. > + * @data: vendor specific cap info. data[] structure type can be deduced > + * from @format field. > + * > + * +===============+======================================================+ > + * | feature | Notes | > + * +===============+======================================================+ > + * | SYSWIDE_PASID | PASIDs are managed in system-wide, instead of per | s/in system-wide/system-wide ? > + * | | device. When a device is assigned to userspace or | > + * | | VM, proper uAPI (userspace driver framework uAPI, | > + * | | e.g. VFIO) must be used to allocate/free PASIDs for | > + * | | the assigned device. Isn't it possible to be more explicit, something like: | System-wide PASID management is mandated by the physical IOMMU. All PASIDs allocation must be mediated through the TBD API. > + * +---------------+------------------------------------------------------+ > + * | BIND_PGTBL | The owner of the first level/stage page table must | > + * | | explicitly bind the page table to associated PASID | > + * | | (either the one specified in bind request or the | > + * | | default PASID of iommu domain), through userspace | > + * | | driver framework uAPI (e.g. VFIO_IOMMU_NESTING_OP). | As per your answer in https://lkml.org/lkml/2020/7/6/383, I now understand ARM would not expose that BIND_PGTBL nesting feature, I still think the above wording is a bit confusing. Maybe you may explicitly talk about the PASID *entry* that needs to be passed from guest to host. On ARM we directly pass the PASID table but when reading the above description I fail to determine if this does not fit that description. > + * +---------------+------------------------------------------------------+ > + * | CACHE_INVLD | The owner of the first level/stage page table must | > + * | | explicitly invalidate the IOMMU cache through uAPI | > + * | | provided by userspace driver framework (e.g. VFIO) | > + * | | according to vendor-specific requirement when | > + * | | changing the page table. | > + * +---------------+------------------------------------------------------+ instead of using the "uAPI provided by userspace driver framework (e.g. VFIO)", can't we use the so-called IOMMU UAPI terminology which now has a userspace documentation? > + * > + * @data[] types defined for @format: > + * +================================+=====================================+ > + * | @format | @data[] | > + * +================================+=====================================+ > + * | IOMMU_PASID_FORMAT_INTEL_VTD | struct iommu_nesting_info_vtd | > + * +--------------------------------+-------------------------------------+ > + * > + */ > +struct iommu_nesting_info { > + __u32 size; shouldn't it be @argsz to fit the iommu uapi convention and take benefit to put the flags field just below? > + __u32 format; > +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) > + __u32 features; > + __u32 flags; > + __u16 addr_width; > + __u16 pasid_bits; > + __u32 padding; > + __u8 data[]; > +}; > + > +/* > + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info > + * > + * @flags: VT-d specific flags. Currently reserved for future > + * extension. must be set to 0? > + * @cap_reg: Describe basic capabilities as defined in VT-d capability > + * register. > + * @ecap_reg: Describe the extended capabilities as defined in VT-d > + * extended capability register. > + */ > +struct iommu_nesting_info_vtd { > + __u32 flags; > + __u32 padding; > + __u64 cap_reg; > + __u64 ecap_reg; > +}; > + > #endif /* _UAPI_IOMMU_H */ Thanks Eric >