On Sun, May 12, 2024 at 11:27:45AM -0300, Jason Gunthorpe wrote: > On Fri, Apr 12, 2024 at 08:47:02PM -0700, Nicolin Chen wrote: > > + viommu = iommu_dev->ops->viommu_alloc(idev->dev, cmd->type, > > + hwpt_paging->common.domain); > > + if (IS_ERR(viommu)) { > > + rc = PTR_ERR(viommu); > > + goto out_put_hwpt; > > + } > > Ah you did already include the S2, So should it be > domain->viommu_alloc() then? We can do that. In that case, the VIOMMU_ALLOC ioctl should be simply per S2 HWPT too v.s. per IDEV. > > + > > + /* iommufd_object_finalize will store the viommu->obj.id */ > > + rc = xa_alloc(&ucmd->ictx->objects, &viommu->obj.id, XA_ZERO_ENTRY, > > + xa_limit_31b, GFP_KERNEL_ACCOUNT); > > + if (rc) > > + goto out_free; > > + > > + viommu->obj.type = IOMMUFD_OBJ_VIOMMU; > > See my other notes, lets try not to open code this. Ack. > > + viommu->type = cmd->type; > > + > > + viommu->ictx = ucmd->ictx; > > + viommu->hwpt = hwpt_paging; > > + viommu->iommu_dev = idev->dev->iommu->iommu_dev; > > + cmd->out_viommu_id = viommu->obj.id; > > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); > > + if (rc) > > + goto out_erase_xa; > > + iommufd_object_finalize(ucmd->ictx, &viommu->obj); > > + refcount_inc(&viommu->hwpt->common.obj.users); > > + goto out_put_hwpt; > > + > > +out_erase_xa: > > + xa_erase(&ucmd->ictx->objects, viommu->obj.id); > > +out_free: > > + if (viommu->ops && viommu->ops->free) > > + viommu->ops->free(viommu); > > + kfree(viommu); > > This really should use the abort flow. The driver free callback has to > be in the object release.. Yea, with the original object allocator, we probably can do abort(). > > + > > +/** > > + * enum iommu_viommu_type - VIOMMU Type > > + * @IOMMU_VIOMMU_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3 > > + */ > > +enum iommu_viommu_type { > > + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV, > > +}; > > At least the 241 line should be in a following patch It's for the "enum iommu_viommu_type" mentioned in the following structure. Yi told me that you don't like an empty enum, and he did something like this in HWPT_INVALIDATE series: https://lore.kernel.org/linux-iommu/20240111041015.47920-3-yi.l.liu@xxxxxxxxx/ > > +/** > > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) > > + * @size: sizeof(struct iommu_viommu_alloc) > > + * @flags: Must be 0 > > + * @type: Type of the VIOMMU object. Must be defined in enum iommu_viommu_type > > + * @dev_id: The device to allocate this virtual IOMMU for > > + * @hwpt_id: ID of a nested parent HWPT > > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object > > + * > > + * Allocate an virtual IOMMU object that holds a (shared) nested parent HWPT > > + */ > > +struct iommu_viommu_alloc { > > + __u32 size; > > + __u32 flags; > > + __u32 type; > > + __u32 dev_id; > > + __u32 hwpt_id; > > + __u32 out_viommu_id; > > +}; > > This seems fine. > > Let's have a following patch to change the hwpt_alloc to accept the > viommu as a hwpt as a uAPI change as well. > > The more I think about how this needs to work the more sure I am that > we need to do that. > > ARM will need a fairly tricky set of things to manage the VMID > lifecycle. In BTM mode the VMID must come from the KVM. For vcmdq the > VMID is needed to create the queue/viommu. For AMD the S2 is needed to > create the VIOMMU in the first place. > > So, to make this all work perfectly we need approx the following > - S2 sharing across instances in ARM - meaning the VMID is allocated > at attach not domain alloc > - S2 hwpt is refcounted by the VIOMMU in the iommufd layer > - VIOMMU is refcounted by every nesting child in the iommufd layer > - The nesting child holds a pointer to both the S2 and the VIOMMU > (viommu optional) > - When the nesting child attaches to a device the STE will source the > VMID from the VIOMMU if present otherwise from the S2 > - "RID" attach (ie naked S2) will have to be done with a Nesting > Child using a vSTE that indicates Identity. Then the attach logic > will have enough information to get the VMID from the VIOMMU What is this RID attach (naked S2) case? S1DSS_BYPASS + SVA? > - In full VIOMMU mode the S2 will never get a VMID of its own, it > will always use the VIOMMU. Life cycle is simple, the VMID is freed > when the VIOMMU is freed. That can't happen until all Nesting > Children are freed. That can't happen until all Nesting Children > are detached from devices. Detatching removes the HW touch of the VMID. So, each VM will have one S2 HWPT/domain/iopt, but each VM can have multiple VIOMMU instances sharing that single S2 HWPT, and each VIOMMU instance (in the SMMU driver at least) holds a vmid. This seems to be a quite clear big picture now! > At this point you don't need the full generality, but let's please get > ready and get the viommu pointer available in all the right spots and > we can keep the current logic to borrow the VMID from the S2 for the > VIOMMU. Yea. Will try as much as I can. Thanks Nicolin