On Tue, Aug 27, 2024 at 12:51:38PM -0300, Jason Gunthorpe wrote: > For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting > as the parent and a user provided STE fragment that defines the CD table > and related data with addresses translated by the S2 iommu_domain. > > The kernel only permits userspace to control certain allowed bits of the > STE that are safe for user/guest control. > > IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2 > translation, but there is no way of knowing which S1 entries refer to a > range of S2. > > For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to > flush all ASIDs from the VMID after flushing the S2 on any change to the > S2. > > Similarly we have to flush the entire ATC if the S2 is changed. > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> Reviewed-by: Nicolin Chen <nicolinc@xxxxxxxxxx> With some small nits: > @@ -2192,6 +2255,16 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, > } > __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); > > + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && > + smmu_domain->nest_parent) { smmu_domain->nest_parent alone is enough? [---] > +static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, > + struct device *dev) > +{ [..] > + if (arm_smmu_ssids_in_use(&master->cd_table) || This feels more like a -EBUSY as it would be unlikely able to attach to a different nested domain? > + nested_domain->s2_parent->smmu != master->smmu) > + return -EINVAL; [---] > +static struct iommu_domain * > +arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags, > + struct iommu_domain *parent, > + const struct iommu_user_data *user_data) > +{ > + struct arm_smmu_master *master = dev_iommu_priv_get(dev); > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > + struct arm_smmu_nested_domain *nested_domain; > + struct arm_smmu_domain *smmu_parent; > + struct iommu_hwpt_arm_smmuv3 arg; > + unsigned int eats; > + unsigned int cfg; > + int ret; > + > + if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) > + return ERR_PTR(-EOPNOTSUPP); > + > + /* > + * Must support some way to prevent the VM from bypassing the cache > + * because VFIO currently does not do any cache maintenance. > + */ > + if (!(fwspec->flags & IOMMU_FWSPEC_PCI_RC_CANWBS) && > + !(master->smmu->features & ARM_SMMU_FEAT_S2FWB)) > + return ERR_PTR(-EOPNOTSUPP); > + > + ret = iommu_copy_struct_from_user(&arg, user_data, > + IOMMU_HWPT_DATA_ARM_SMMUV3, ste); > + if (ret) > + return ERR_PTR(ret); > + > + if (flags || !(master->smmu->features & ARM_SMMU_FEAT_TRANS_S1)) > + return ERR_PTR(-EOPNOTSUPP); A bit redundant to the first sanity against ARM_SMMU_FEAT_NESTING, since ARM_SMMU_FEAT_NESTING includes ARM_SMMU_FEAT_TRANS_S1. > + > + if (!(parent->type & __IOMMU_DOMAIN_PAGING)) > + return ERR_PTR(-EINVAL); > + > + smmu_parent = to_smmu_domain(parent); > + if (smmu_parent->stage != ARM_SMMU_DOMAIN_S2 || Maybe "!smmu_parent->nest_parent" instead. [---] > + smmu_parent->smmu != master->smmu) > + return ERR_PTR(-EINVAL); It'd be slightly nicer if we do all the non-arg validations prior to calling iommu_copy_struct_from_user(). Then, the following arg validations would be closer to the copy(). > + > + /* EIO is reserved for invalid STE data. */ > + if ((arg.ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) || > + (arg.ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED)) > + return ERR_PTR(-EIO); [---] > /* The following are exposed for testing purposes. */ > struct arm_smmu_entry_writer_ops; > struct arm_smmu_entry_writer { > @@ -830,6 +849,7 @@ struct arm_smmu_master_domain { > struct list_head devices_elm; > struct arm_smmu_master *master; > ioasid_t ssid; > + u8 nest_parent; Would it be nicer to match with the one in struct arm_smmu_domain: + bool nest_parent : 1; ? > + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 Context Descriptor Table info > + * (IOMMU_HWPT_DATA_ARM_SMMUV3) > + * > + * @ste: The first two double words of the user space Stream Table Entry for > + * a user stage-1 Context Descriptor Table. Must be little-endian. > + * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) > + * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax > + * - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD It seems that word-1 is missing EATS. Thanks Nicolin