> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Sent: Saturday, June 27, 2020 2:53 PM > > Hi Robin, > > > From: Robin Murphy <robin.murphy@xxxxxxx> > > Sent: Saturday, June 27, 2020 12:05 AM > > > > On 2020-06-26 08:47, Jean-Philippe Brucker wrote: > > > On Wed, Jun 24, 2020 at 01:55:15AM -0700, Liu Yi L wrote: > > >> IOMMUs that support nesting translation needs report the capability > > >> info to userspace, e.g. the format of first level/stage paging structures. > > >> > > >> This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can > > >> get nesting info after setting DOMAIN_ATTR_NESTING. > > >> > > >> v2 -> v3: > > >> *) remvoe cap/ecap_mask in iommu_nesting_info. > > >> *) reuse DOMAIN_ATTR_NESTING to get nesting info. > > >> *) return an empty iommu_nesting_info for SMMU drivers per Jean' > > >> suggestion. > > >> > > >> Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > > >> CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > > >> Cc: Eric Auger <eric.auger@xxxxxxxxxx> > > >> Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > > >> Cc: Joerg Roedel <joro@xxxxxxxxxx> > > >> Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > > >> Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > > >> Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > >> --- > > >> drivers/iommu/arm-smmu-v3.c | 29 ++++++++++++++++++++-- > > >> drivers/iommu/arm-smmu.c | 29 ++++++++++++++++++++-- > > > > > > Looks reasonable to me. Please move the SMMU changes to a separate > > > patch and Cc the SMMU maintainers: > > > > Cheers Jean, I'll admit I've been skipping over a lot of these patches lately :) > > > > A couple of comments below... > > > > > > > > Cc: Will Deacon <will@xxxxxxxxxx> > > > Cc: Robin Murphy <robin.murphy@xxxxxxx> > > > > > > Thanks, > > > Jean > > > > > >> include/uapi/linux/iommu.h | 59 > > +++++++++++++++++++++++++++++++++++++++++++++ > > >> 3 files changed, 113 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/drivers/iommu/arm-smmu-v3.c > > >> b/drivers/iommu/arm-smmu-v3.c index f578677..0c45d4d 100644 > > >> --- a/drivers/iommu/arm-smmu-v3.c > > >> +++ b/drivers/iommu/arm-smmu-v3.c > > >> @@ -3019,6 +3019,32 @@ static struct iommu_group > > *arm_smmu_device_group(struct device *dev) > > >> return group; > > >> } > > >> > > >> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain > > *smmu_domain, > > >> + void *data) > > >> +{ > > >> + struct iommu_nesting_info *info = (struct iommu_nesting_info *) > data; > > >> + u32 size; > > >> + > > >> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > > >> + return -ENODEV; > > >> + > > >> + size = sizeof(struct iommu_nesting_info); > > >> + > > >> + /* > > >> + * if provided buffer size is not equal to the size, should > > >> + * return 0 and also the expected buffer size to caller. > > >> + */ > > >> + if (info->size != size) { > > >> + info->size = size; > > >> + return 0; > > >> + } > > >> + > > >> + /* report an empty iommu_nesting_info for now */ > > >> + memset(info, 0x0, size); > > >> + info->size = size; > > >> + return 0; > > >> +} > > >> + > > >> static int arm_smmu_domain_get_attr(struct iommu_domain *domain, > > >> enum iommu_attr attr, void *data) > > >> { > > >> @@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct > > iommu_domain *domain, > > >> case IOMMU_DOMAIN_UNMANAGED: > > >> switch (attr) { > > >> case DOMAIN_ATTR_NESTING: > > >> - *(int *)data = (smmu_domain->stage == > > ARM_SMMU_DOMAIN_NESTED); > > >> - return 0; > > >> + return > arm_smmu_domain_nesting_info(smmu_domain, > > data); > > >> default: > > >> return -ENODEV; > > >> } > > >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > >> index 243bc4c..908607d 100644 > > >> --- a/drivers/iommu/arm-smmu.c > > >> +++ b/drivers/iommu/arm-smmu.c > > >> @@ -1506,6 +1506,32 @@ static struct iommu_group > > *arm_smmu_device_group(struct device *dev) > > >> return group; > > >> } > > >> > > >> +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain > > *smmu_domain, > > >> + void *data) > > >> +{ > > >> + struct iommu_nesting_info *info = (struct iommu_nesting_info *) > data; > > >> + u32 size; > > >> + > > >> + if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > > >> + return -ENODEV; > > >> + > > >> + size = sizeof(struct iommu_nesting_info); > > >> + > > >> + /* > > >> + * if provided buffer size is not equal to the size, should > > >> + * return 0 and also the expected buffer size to caller. > > >> + */ > > >> + if (info->size != size) { > > >> + info->size = size; > > >> + return 0; > > >> + } > > >> + > > >> + /* report an empty iommu_nesting_info for now */ > > >> + memset(info, 0x0, size); > > >> + info->size = size; > > >> + return 0; > > >> +} > > >> + > > >> static int arm_smmu_domain_get_attr(struct iommu_domain *domain, > > >> enum iommu_attr attr, void *data) > > >> { > > >> @@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct > > iommu_domain *domain, > > >> case IOMMU_DOMAIN_UNMANAGED: > > >> switch (attr) { > > >> case DOMAIN_ATTR_NESTING: > > >> - *(int *)data = (smmu_domain->stage == > > ARM_SMMU_DOMAIN_NESTED); > > >> - return 0; > > >> + return > arm_smmu_domain_nesting_info(smmu_domain, > > data); > > >> default: > > >> return -ENODEV; > > >> } > > >> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > > >> index 1afc661..898c99a 100644 > > >> --- a/include/uapi/linux/iommu.h > > >> +++ b/include/uapi/linux/iommu.h > > >> @@ -332,4 +332,63 @@ struct iommu_gpasid_bind_data { > > >> } vendor; > > >> }; > > >> > > >> +/* > > >> + * struct iommu_nesting_info - Information for nesting-capable > IOMMU. > > >> + * user space should check it before using > > >> + * nesting capability. > > >> + * > > >> + * @size: size of the whole structure > > >> + * @format: PASID table entry format, the same definition with > > >> + * @format of struct iommu_gpasid_bind_data. > > >> + * @features: supported nesting features. > > >> + * @flags: currently reserved for future extension. > > >> + * @data: vendor specific cap info. > > >> + * > > >> + * +---------------+----------------------------------------------------+ > > >> + * | feature | Notes | > > >> + * > > >> > > ++===============+============================================ > > ======= > > >> +=+ > > >> + * | SYSWIDE_PASID | Kernel manages PASID in system wide, PASIDs > used | > > >> + * | | in the system should be allocated by host kernel | > > >> + * +---------------+----------------------------------------------------+ > > >> + * | BIND_PGTBL | bind page tables to host PASID, the PASID could | > > >> + * | | either be a host PASID passed in bind request or | > > >> + * | | default PASIDs (e.g. default PASID of aux-domain) | > > >> + * +---------------+----------------------------------------------------+ > > >> + * | CACHE_INVLD | mandatory feature for nesting capable IOMMU > | > > >> + * > > >> ++---------------+--------------------------------------------------- > > >> +-+ > > >> + * > > >> + */ > > >> +struct iommu_nesting_info { > > >> + __u32 size; > > >> + __u32 format; > > >> + __u32 features; > > >> +#define IOMMU_NESTING_FEAT_SYSWIDE_PASID (1 << 0) > > >> +#define IOMMU_NESTING_FEAT_BIND_PGTBL (1 << 1) > > >> +#define IOMMU_NESTING_FEAT_CACHE_INVLD (1 << 2) > > >> + __u32 flags; > > >> + __u8 data[]; > > >> +}; > > >> + > > >> +/* > > >> + * struct iommu_nesting_info_vtd - Intel VT-d specific nesting info > > >> + * > > >> + * > > >> + * @flags: VT-d specific flags. Currently reserved for future > > >> + * extension. > > >> + * @addr_width: The output addr width of first level/stage translation > > >> + * @pasid_bits: Maximum supported PASID bits, 0 represents no > PASID > > >> + * support. > > >> + * @cap_reg: Describe basic capabilities as defined in VT-d > capability > > >> + * register. > > >> + * @ecap_reg: Describe the extended capabilities as defined in VT-d > > >> + * extended capability register. > > >> + */ > > >> +struct iommu_nesting_info_vtd { > > >> + __u32 flags; > > >> + __u16 addr_width; > > > > I think this might be worth promoting to a generic feature - Arm has the > same > > notion of intermediate address size, and I'd imagine that pretty much any > other > > two-stage translation system would as well (either explicitly or implicitly). > > It also > > comes close to something the DPDK folks raised where they wanted parity > with a > > feature that currently scrapes AGAW out of some VT-d-specific place, so > > abstracting it to completely generic code, in a way that could eventually be > > generalised to reporting info for non-nested domains too, would be really > nice. > > got you. I can do that. > > > What would also be cool is if the user was able to pass in a structure with > > preferred values for the address size and other capabilities when they > request > > nesting in the first place. Right now we'll always set up the maximum > possible > > sized page table for any domain, but if we knew ahead of time how many > bits the > > user actually cared about then we could potentially be more efficient (e.g. > use > > fewer levels of pagetable or a different translation granule). > > agreed, and I guess only the configurable caps (like the addr_width, domain > could have different addr_width per user request). I think it may be an > optimization afterward. Here, we report all the nesting related caps to user, > thus user could either do pre-check or expose correct capability to guest per > hardware support. This is necesary as nesting requires guest to maintain > page > tables per hw supporting. > yes, this likely requires a new uAPI thus it could come as an incremental patch later. We may reuse the same structure as defined here for communicating preferred values. Thanks Kevin