Re: [PATCH v3 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT

Yi Liu <yi.l.liu@xxxxxxxxx> · Fri, 18 Oct 2024 13:40:13 +0800

On 2024/10/17 00:11, Alex Williamson wrote:
On Wed, 16 Oct 2024 11:35:05 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:

On 2024/10/16 00:22, Alex Williamson wrote:
On Tue, 15 Oct 2024 19:07:52 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:
   
On 2024/10/14 23:49, Alex Williamson wrote:
On Sat, 12 Oct 2024 21:49:05 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:
      
On 2024/10/1 20:11, Jason Gunthorpe wrote:
On Mon, Sep 30, 2024 at 07:55:08AM +0000, Tian, Kevin wrote:
         
+struct vfio_device_pasid_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pasid;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT	_IO(VFIO_TYPE,
VFIO_BASE + 21)

Not sure whether this was discussed before. Does it make sense
to reuse the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT
by introducing a new pasid field and a new flag bit?

Maybe? I don't have a strong feeling either way.

There is somewhat less code if you reuse the ioctl at least

I had a rough memory that I was suggested to add a separate ioctl for
PASID. Let's see Alex's opinion.

I don't recall any previous arguments for separate ioctls, but it seems
to make a lot of sense to me to extend the existing ioctls with a flag
to indicate pasid cscope and id.  Thanks,

thanks for the confirmation. How about the below?

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index bb1817bd4ff3..c78533bce3c6 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -162,21 +162,34 @@ void vfio_df_unbind_iommufd(struct vfio_device_file *df)
    int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
    			    struct vfio_device_attach_iommufd_pt __user *arg)
    {
-	struct vfio_device *device = df->device;
+	unsigned long minsz = offsetofend(
+			struct vfio_device_attach_iommufd_pt, pt_id);
    	struct vfio_device_attach_iommufd_pt attach;
-	unsigned long minsz;
+	struct vfio_device *device = df->device;
+	u32 user_size;
    	int ret;

-	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+	ret = get_user(user_size, (u32 __user *)arg);
+	if (ret)
+		return ret;

-	if (copy_from_user(&attach, arg, minsz))
-		return -EFAULT;
+	ret = copy_struct_from_user(&attach, sizeof(attach), arg, user_size);
+	if (ret)
+		return ret;

I think this could break current users.

not quite get here. My understanding is as the below:

If the current user (compiled with the existing uapi struct), it will
provide less data that the new kernel knows. The copy_struct_from_user()
would zero the trailing bytes. And such user won't set the new flag, so
it should be fine.

You're making an assumption that the user is passing exactly the
existing struct size as argsz, which is not a requirement.  The user
could allocate a buffer page, argsz might be 4K.

I see. If the argsz is far larger than the existing struct size, it might
be large than the size of the new struct as well. Then the trailing bytes
would result in failure which does not fail with old user and old kernel.

So to me, the trailing bytes concern comes when user is compiled with the
new uapi struct but running on an eld kernel (say <= 6.12).But the eld
kernel uses copy_from_user(), so even there is non-zero trailing bytes in
the user buffer, the eld kernel just ignores them. So this seems not an
issue to me so far.

That's new userspace, old kernel.  I'm referencing an issue with old
userspace, new kernel, where old userspace has no requirement that
argsz is exactly the structure size, nor that the trailing bytes from
the structure size to argsz are zero'd.

got it.

For better or worse, we don't
currently have any requirements for the remainder of the user buffer,
whereas copy_struct_from_user() returns an error for non-zero trailing
bytes.

This seems to be a general requirement when using copy_struct_from_user().
User needs to zero the fields that it does not intend to use. If there is
no such requirement for user, then the trailing bytes can be a concern in
the future but not this time as the existing kernel uses copy_from_user().

Exactly, the current implementation makes no requirement on trailing
non-zero bytes, copy_struct_from_user() does.

got it.

I think we need to monotonically increase the structure size,
but maybe something more like below, using flags.  The expectation
would be that if we add another flag that extends the structure, we'd
test that flag after PASID and clobber xend to a new value further into
the new structure.  We'd also add that flag to the flags mask, but we'd
share the copy code.

agree, this share code might be needed for other path as well. Some macros
I guess.


	if (attach.argsz < minsz)
		return -EINVAL;

	if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID))
		return -EINVAL;

	if (attach.flags & VFIO_DEVICE_ATTACH_PASID)
		xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid);

	if (xend) {
		if (attach.argsz < xend)
			return -EINVAL;
	
		if (copy_from_user((void *)&attach + minsz,
				    (void __user *)arg + minsz, xend - minsz))
			return -EFAULT;
	}

I think it might need to zero the trailing bytes if the user does not set
the extended flag. is it?

We could zero initialize the attach structure for safety, but the kernel
side code should also be driven by flags.  We should never look at a
field beyond the base structure that isn't indicated by flags and copied
from the user.

yes, zeroing the trailing bytes is for safety, and all the new fields
should be indicated by flags.

Maybe there are still more elegant options available.

We also generally try to label flags with FLAGS in the name, but it
does get rather unwieldy, so I'm open to suggestions.  Thanks,

There is already examples that new field added to a kernel-to-user uapi
struct like the vfio_region_info::cap_offset and
vfio_device_info::cap_offset. But it might be a little bit different
with the case we face here as it's user-to-kernel struct. It's a time for
you to set a rule for such extensions. :)

That's a returned structure, so it's a bit different, but the same
philosophy, we don't break userspace.  In that case we used argsz as an
output field and flags to indicate there are capabilities.  Old
userspace ignores the flag and argsz semantics and continues to work.
New userspace checks the flag, reallocs the buffer with argsz and
retries.  This is however an example of userspace providing an argsz
value that exceeds the ioctl structure, with no requirement to zero the
buffer, though it is an output structure rather than the input
structure here.

I think the same holds here, our policy is to not break userspace.  We
potentially do break userspace if we impose a requirement that the
trailing buffer size must now be zero.  We have the flags field so that
we don't need to blindly base the structure version on the size of the
user buffer.  We should use the flags field to determine relevant and
valid fields beyond the base structure without imposing new
requirements to userspace.  Thanks,

got it. let me send another version.

--
Regards,
Yi Liu