Re: [PATCH RFCv1 05/14] iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC

Nicolin Chen <nicolinc@xxxxxxxxxx> · Sun, 12 May 2024 21:33:59 -0700

On Sun, May 12, 2024 at 11:27:45AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 12, 2024 at 08:47:02PM -0700, Nicolin Chen wrote:
> > +	viommu = iommu_dev->ops->viommu_alloc(idev->dev, cmd->type,
> > +					      hwpt_paging->common.domain);
> > +	if (IS_ERR(viommu)) {
> > +		rc = PTR_ERR(viommu);
> > +		goto out_put_hwpt;
> > +	}
> 
> Ah you did already include the S2, So should it be
> domain->viommu_alloc() then?

We can do that. In that case, the VIOMMU_ALLOC ioctl should be
simply per S2 HWPT too v.s. per IDEV.

> > +
> > +	/* iommufd_object_finalize will store the viommu->obj.id */
> > +	rc = xa_alloc(&ucmd->ictx->objects, &viommu->obj.id, XA_ZERO_ENTRY,
> > +		      xa_limit_31b, GFP_KERNEL_ACCOUNT);
> > +	if (rc)
> > +		goto out_free;
> > +
> > +	viommu->obj.type = IOMMUFD_OBJ_VIOMMU;
> 
> See my other notes, lets try not to open code this.

Ack.

> > +	viommu->type = cmd->type;
> > +
> > +	viommu->ictx = ucmd->ictx;
> > +	viommu->hwpt = hwpt_paging;
> > +	viommu->iommu_dev = idev->dev->iommu->iommu_dev;
> > +	cmd->out_viommu_id = viommu->obj.id;
> > +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > +	if (rc)
> > +		goto out_erase_xa;
> > +	iommufd_object_finalize(ucmd->ictx, &viommu->obj);
> > +	refcount_inc(&viommu->hwpt->common.obj.users);
> > +	goto out_put_hwpt;
> > +
> > +out_erase_xa:
> > +	xa_erase(&ucmd->ictx->objects, viommu->obj.id);
> > +out_free:
> > +	if (viommu->ops && viommu->ops->free)
> > +		viommu->ops->free(viommu);
> > +	kfree(viommu);
> 
> This really should use the abort flow. The driver free callback has to
> be in the object release..

Yea, with the original object allocator, we probably can do abort().

> > +
> > +/**
> > + * enum iommu_viommu_type - VIOMMU Type
> > + * @IOMMU_VIOMMU_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3
> > + */
> > +enum iommu_viommu_type {
> > +	IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV,
> > +};
> 
> At least the 241 line should be in a following patch

It's for the "enum iommu_viommu_type" mentioned in the following
structure. Yi told me that you don't like an empty enum, and he
did something like this in HWPT_INVALIDATE series:
https://lore.kernel.org/linux-iommu/20240111041015.47920-3-yi.l.liu@xxxxxxxxx/

> > +/**
> > + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
> > + * @size: sizeof(struct iommu_viommu_alloc)
> > + * @flags: Must be 0
> > + * @type: Type of the VIOMMU object. Must be defined in enum iommu_viommu_type
> > + * @dev_id: The device to allocate this virtual IOMMU for
> > + * @hwpt_id: ID of a nested parent HWPT
> > + * @out_viommu_id: Output virtual IOMMU ID for the allocated object
> > + *
> > + * Allocate an virtual IOMMU object that holds a (shared) nested parent HWPT
> > + */
> > +struct iommu_viommu_alloc {
> > +	__u32 size;
> > +	__u32 flags;
> > +	__u32 type;
> > +	__u32 dev_id;
> > +	__u32 hwpt_id;
> > +	__u32 out_viommu_id;
> > +};
> 
> This seems fine.
> 
> Let's have a following patch to change the hwpt_alloc to accept the
> viommu as a hwpt as a uAPI change as well. 
> 
> The more I think about how this needs to work the more sure I am that
> we need to do that.
> 
> ARM will need a fairly tricky set of things to manage the VMID
> lifecycle. In BTM mode the VMID must come from the KVM. For vcmdq the
> VMID is needed to create the queue/viommu. For AMD the S2 is needed to
> create the VIOMMU in the first place.
> 
> So, to make this all work perfectly we need approx the following
>  - S2 sharing across instances in ARM - meaning the VMID is allocated
>    at attach not domain alloc
>  - S2 hwpt is refcounted by the VIOMMU in the iommufd layer
>  - VIOMMU is refcounted by every nesting child in the iommufd layer
>  - The nesting child holds a pointer to both the S2 and the VIOMMU
>    (viommu optional)
>  - When the nesting child attaches to a device the STE will source the
>    VMID from the VIOMMU if present otherwise from the S2
>  - "RID" attach (ie naked S2) will have to be done with a Nesting
>    Child using a vSTE that indicates Identity. Then the attach logic
>    will have enough information to get the VMID from the VIOMMU

What is this RID attach (naked S2) case? S1DSS_BYPASS + SVA?

>  - In full VIOMMU mode the S2 will never get a VMID of its own, it
>    will always use the VIOMMU. Life cycle is simple, the VMID is freed
>    when the VIOMMU is freed. That can't happen until all Nesting
>    Children are freed. That can't happen until all Nesting Children
>    are detached from devices. Detatching removes the HW touch of the VMID.

So, each VM will have one S2 HWPT/domain/iopt, but each VM can
have multiple VIOMMU instances sharing that single S2 HWPT, and
each VIOMMU instance (in the SMMU driver at least) holds a vmid.

This seems to be a quite clear big picture now!

> At this point you don't need the full generality, but let's please get
> ready and get the viommu pointer available in all the right spots and
> we can keep the current logic to borrow the VMID from the S2 for the
> VIOMMU.

Yea. Will try as much as I can.

Thanks
Nicolin