Hi Robin, > From: Robin Murphy <robin.murphy@xxxxxxx> > Sent: Saturday, June 27, 2020 12:05 AM > > On 2020-06-26 08:47, Jean-Philippe Brucker wrote: > > On Wed, Jun 24, 2020 at 01:55:15AM -0700, Liu Yi L wrote: > >> IOMMUs that support nesting translation needs report the capability > >> info to userspace, e.g. the format of first level/stage paging structures. > >> > >> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can > >> get nesting info after setting DOMAIN_ATTR_NESTING. > >> > >> v2 -> v3: > >> *) remvoe cap/ecap_mask in iommu_nesting_info. > >> *) reuse DOMAIN_ATTR_NESTING to get nesting info. > >> *) return an empty iommu_nesting_info for SMMU drivers per Jean' > >> suggestion. > >> > >> Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > >> CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > >> Cc: Eric Auger <eric.auger@xxxxxxxxxx> > >> Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > >> Cc: Joerg Roedel <joro@xxxxxxxxxx> > >> Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > >> Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > >> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > >> --- > >> drivers/iommu/arm-smmu-v3.c | 29 ++++++++++++++++++++-- > >> drivers/iommu/arm-smmu.c | 29 ++++++++++++++++++++-- > > > > Looks reasonable to me. Please move the SMMU changes to a separate > > patch and Cc the SMMU maintainers: > > Cheers Jean, I'll admit I've been skipping over a lot of these patches lately :) > > A couple of comments below... > > > > > Cc: Will Deacon <will@xxxxxxxxxx> > > Cc: Robin Murphy <robin.murphy@xxxxxxx> > > > > Thanks, > > Jean > > > >> include/uapi/linux/iommu.h | 59 > +++++++++++++++++++++++++++++++++++++++++++++ > >> 3 files changed, 113 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/iommu/arm-smmu-v3.c > >> b/drivers/iommu/arm-smmu-v3.c index f578677..0c45d4d 100644 > >> --- a/drivers/iommu/arm-smmu-v3.c > >> +++ b/drivers/iommu/arm-smmu-v3.c > >> @@ -3019,6 +3019,32 @@ static struct iommu_group > *arm_smmu_device_group(struct device *dev) > >> return group; > >> } > >> > >> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain > *smmu_domain, > >> + void *data) > >> +{ > >> + struct iommu_nesting_info *info = (struct iommu_nesting_info *) data; > >> + u32 size; > >> + > >> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > >> + return -ENODEV; > >> + > >> + size = sizeof(struct iommu_nesting_info); > >> + > >> + /* > >> + * if provided buffer size is not equal to the size, should > >> + * return 0 and also the expected buffer size to caller. > >> + */ > >> + if (info->size != size) { > >> + info->size = size; > >> + return 0; > >> + } > >> + > >> + /* report an empty iommu_nesting_info for now */ > >> + memset(info, 0x0, size); > >> + info->size = size; > >> + return 0; > >> +} > >> + > >> static int arm_smmu_domain_get_attr(struct iommu_domain *domain, > >> enum iommu_attr attr, void *data) > >> { > >> @@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct > iommu_domain *domain, > >> case IOMMU_DOMAIN_UNMANAGED: > >> switch (attr) { > >> case DOMAIN_ATTR_NESTING: > >> - *(int *)data = (smmu_domain->stage == > ARM_SMMU_DOMAIN_NESTED); > >> - return 0; > >> + return arm_smmu_domain_nesting_info(smmu_domain, > data); > >> default: > >> return -ENODEV; > >> } > >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > >> index 243bc4c..908607d 100644 > >> --- a/drivers/iommu/arm-smmu.c > >> +++ b/drivers/iommu/arm-smmu.c > >> @@ -1506,6 +1506,32 @@ static struct iommu_group > *arm_smmu_device_group(struct device *dev) > >> return group; > >> } > >> > >> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain > *smmu_domain, > >> + void *data) > >> +{ > >> + struct iommu_nesting_info *info = (struct iommu_nesting_info *) data; > >> + u32 size; > >> + > >> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > >> + return -ENODEV; > >> + > >> + size = sizeof(struct iommu_nesting_info); > >> + > >> + /* > >> + * if provided buffer size is not equal to the size, should > >> + * return 0 and also the expected buffer size to caller. > >> + */ > >> + if (info->size != size) { > >> + info->size = size; > >> + return 0; > >> + } > >> + > >> + /* report an empty iommu_nesting_info for now */ > >> + memset(info, 0x0, size); > >> + info->size = size; > >> + return 0; > >> +} > >> + > >> static int arm_smmu_domain_get_attr(struct iommu_domain *domain, > >> enum iommu_attr attr, void *data) > >> { > >> @@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct > iommu_domain *domain, > >> case IOMMU_DOMAIN_UNMANAGED: > >> switch (attr) { > >> case DOMAIN_ATTR_NESTING: > >> - *(int *)data = (smmu_domain->stage == > ARM_SMMU_DOMAIN_NESTED); > >> - return 0; > >> + return arm_smmu_domain_nesting_info(smmu_domain, > data); > >> default: > >> return -ENODEV; > >> } > >> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > >> index 1afc661..898c99a 100644 > >> --- a/include/uapi/linux/iommu.h > >> +++ b/include/uapi/linux/iommu.h > >> @@ -332,4 +332,63 @@ struct iommu_gpasid_bind_data { > >> } vendor; > >> }; > >> > >> +/* > >> + * struct iommu_nesting_info - Information for nesting-capable IOMMU. > >> + * user space should check it before using > >> + * nesting capability. > >> + * > >> + * @size: size of the whole structure > >> + * @format: PASID table entry format, the same definition with > >> + * @format of struct iommu_gpasid_bind_data. > >> + * @features: supported nesting features. > >> + * @flags: currently reserved for future extension. > >> + * @data: vendor specific cap info. > >> + * > >> + * +---------------+----------------------------------------------------+ > >> + * | feature | Notes | > >> + * > >> > ++===============+============================================ > ======= > >> +=+ > >> + * | SYSWIDE_PASID | Kernel manages PASID in system wide, PASIDs used | > >> + * | | in the system should be allocated by host kernel | > >> + * +---------------+----------------------------------------------------+ > >> + * | BIND_PGTBL | bind page tables to host PASID, the PASID could | > >> + * | | either be a host PASID passed in bind request or | > >> + * | | default PASIDs (e.g. default PASID of aux-domain) | > >> + * +---------------+----------------------------------------------------+ > >> + * | CACHE_INVLD | mandatory feature for nesting capable IOMMU | > >> + * > >> ++---------------+--------------------------------------------------- > >> +-+ > >> + * > >> + */ > >> +struct iommu_nesting_info { > >> + __u32 size; > >> + __u32 format; > >> + __u32 features; > >> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > >> +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > >> +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) > >> + __u32 flags; > >> + __u8 data[]; > >> +}; > >> + > >> +/* > >> + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info > >> + * > >> + * > >> + * @flags: VT-d specific flags. Currently reserved for future > >> + * extension. > >> + * @addr_width: The output addr width of first level/stage translation > >> + * @pasid_bits: Maximum supported PASID bits, 0 represents no PASID > >> + * support. > >> + * @cap_reg: Describe basic capabilities as defined in VT-d capability > >> + * register. > >> + * @ecap_reg: Describe the extended capabilities as defined in VT-d > >> + * extended capability register. > >> + */ > >> +struct iommu_nesting_info_vtd { > >> + __u32 flags; > >> + __u16 addr_width; > > I think this might be worth promoting to a generic feature - Arm has the same > notion of intermediate address size, and I'd imagine that pretty much any other > two-stage translation system would as well (either explicitly or implicitly). > It also > comes close to something the DPDK folks raised where they wanted parity with a > feature that currently scrapes AGAW out of some VT-d-specific place, so > abstracting it to completely generic code, in a way that could eventually be > generalised to reporting info for non-nested domains too, would be really nice. got you. I can do that. > What would also be cool is if the user was able to pass in a structure with > preferred values for the address size and other capabilities when they request > nesting in the first place. Right now we'll always set up the maximum possible > sized page table for any domain, but if we knew ahead of time how many bits the > user actually cared about then we could potentially be more efficient (e.g. use > fewer levels of pagetable or a different translation granule). agreed, and I guess only the configurable caps (like the addr_width, domain could have different addr_width per user request). I think it may be an optimization afterward. Here, we report all the nesting related caps to user, thus user could either do pre-check or expose correct capability to guest per hardware support. This is necesary as nesting requires guest to maintain page tables per hw supporting. Regards, Yi Liu > Robin. > > >> + __u16 pasid_bits; > >> + __u64 cap_reg; > >> + __u64 ecap_reg; > >> +}; > >> + > >> #endif /* _UAPI_IOMMU_H */ > >> -- > >> 2.7.4 > >>