RE: [PATCH v5 5/9] iommufd: Add iommufd fault object

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Wed, 15 May 2024 08:37:09 +0000

> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> @@ -131,6 +131,9 @@ struct iopf_group {
>  	struct iommu_attach_handle *attach_handle;
>  	/* The device's fault data parameter. */
>  	struct iommu_fault_param *fault_param;
> +	/* Used by handler provider to hook the group on its own lists. */
> +	struct list_head node;
> +	u32 cookie;

better put together with attach_handle.

rename 'node' to 'handle_node'

> @@ -128,6 +128,7 @@ enum iommufd_object_type {
>  	IOMMUFD_OBJ_HWPT_NESTED,
>  	IOMMUFD_OBJ_IOAS,
>  	IOMMUFD_OBJ_ACCESS,
> +	IOMMUFD_OBJ_FAULT,

Agree with Jason that 'FAULT_QUEUE' sounds a clearer object name.

> @@ -395,6 +396,8 @@ struct iommufd_device {
>  	/* always the physical device */
>  	struct device *dev;
>  	bool enforce_cache_coherency;
> +	/* outstanding faults awaiting response indexed by fault group id */
> +	struct xarray faults;

this...

> +struct iommufd_fault {
> +	struct iommufd_object obj;
> +	struct iommufd_ctx *ictx;
> +	struct file *filep;
> +
> +	/* The lists of outstanding faults protected by below mutex. */
> +	struct mutex mutex;
> +	struct list_head deliver;
> +	struct list_head response;

...and here worth a discussion.

First the response list is not used. If continuing the choice of queueing
faults per device it should be removed.

But I wonder whether it makes more sense to keep this response
queue per fault object. sounds simpler to me.

Also it's unclear why we need the response message to carry the
same info as the request while only id/code/cookie are used.

+struct iommu_hwpt_page_response {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 code;
+	__u32 cookie;
+	__u32 reserved;
+};

If we keep the response queue in the fault object, the response message
only needs to carry size/flags/code/cookie and cookie can identify the
pending message uniquely in the response queue.

> +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user
> *buf,
> +					size_t count, loff_t *ppos)
> +{
> +	size_t response_size = sizeof(struct iommu_hwpt_page_response);
> +	struct iommufd_fault *fault = filep->private_data;
> +	struct iommu_hwpt_page_response response;
> +	struct iommufd_device *idev = NULL;
> +	struct iopf_group *group;
> +	size_t done = 0;
> +	int rc;
> +
> +	if (*ppos || count % response_size)
> +		return -ESPIPE;
> +
> +	mutex_lock(&fault->mutex);
> +	while (count > done) {
> +		rc = copy_from_user(&response, buf + done, response_size);
> +		if (rc)
> +			break;
> +
> +		if (!idev || idev->obj.id != response.dev_id)
> +			idev = container_of(iommufd_get_object(fault->ictx,
> +							       response.dev_id,
> +
> IOMMUFD_OBJ_DEVICE),
> +					    struct iommufd_device, obj);
> +		if (IS_ERR(idev))
> +			break;
> +
> +		group = xa_erase(&idev->faults, response.cookie);
> +		if (!group)
> +			break;

is 'continue' better?

> +
> +		iopf_group_response(group, response.code);

PCIe spec states that a response failure disables the PRI interface. For SR-IOV
it'd be dangerous allowing user to trigger such code to VF to close the entire
shared PRI interface.

Just another example lacking of coordination for shared capabilities between
PF/VF. But exposing such gap to userspace makes it worse.

I guess we don't want to make this work depending on that cleanup. The
minimal correct thing is to disallow attaching VF to a fault-capable hwpt
with a note here that once we turn on support for VF the response failure
code should not be forwarded to the hardware. Instead it's an indication
that the user cannot serve more requests and such situation waits for
a vPRI reset to recover.

> +		iopf_free_group(group);
> +		done += response_size;
> +
> +		iommufd_put_object(fault->ictx, &idev->obj);

get/put is unpaired:

		if (!idev || idev->obj.id != response.dev_id)
			idev = iommufd_get_object();

		...

		iommufd_put_object(idev);

The intention might be reusing idev if multiple fault responses are
for a same idev. But idev is always put in each iteration then following
messages will access the idev w/o holding the reference.