Re: [PATCH v3 3/4] vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 30 Oct 2024 11:51:42 -0600

On Wed, 30 Oct 2024 20:54:09 +0800
Yi Liu <yi.l.liu@xxxxxxxxx> wrote:

> Hi Alex,
> 
> On 2024/10/18 13:40, Yi Liu wrote:
> >>>> I think we need to monotonically increase the structure size,
> >>>> but maybe something more like below, using flags.  The expectation
> >>>> would be that if we add another flag that extends the structure, we'd
> >>>> test that flag after PASID and clobber xend to a new value further into
> >>>> the new structure.  We'd also add that flag to the flags mask, but we'd
> >>>> share the copy code.  
> >>>
> >>> agree, this share code might be needed for other path as well. Some macros
> >>> I guess.
> >>>  
> >>>>
> >>>>     if (attach.argsz < minsz)
> >>>>         return -EINVAL;
> >>>>
> >>>>     if (attach.flags & (~VFIO_DEVICE_ATTACH_PASID))
> >>>>         return -EINVAL;
> >>>>
> >>>>     if (attach.flags & VFIO_DEVICE_ATTACH_PASID)
> >>>>         xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid);
> >>>>
> >>>>     if (xend) {
> >>>>         if (attach.argsz < xend)
> >>>>             return -EINVAL;  
> 
> Need to check the future usage of 'xend'. In understanding, 'xend' should
> always be offsetofend(struct, the_last_field). A userspace that uses @pasid 
> field would set argsz >= offsetofend(struct, pasid), most likely it would
> just set argsz==offsetofend(struct, pasid). If so, such userspace would be
> failed when running on a kernel that has added new fields behind @pasid.

No, xend denotes the end of the structure we need to satisfy the flags
that are requested by the user.

> Say two decades later, we add a new field (say @xyz) to this user struct,
> the 'xend' would be updated to be offsetofend(struct, xyz). This 'xend'
> would be larger than the argsz provided by the aforementioned userspace.
> Hence it would be failed in the above check.

New field xyz would require a new flag, VFIO_DEVICE_XYZ and we'd extend
the above code as:

	if (attach.argsz < minsz)
		return -EINVAL;

	if (attach.flags & (~(VFIO_DEVICE_ATTACH_PASID |
			      VFIO_DEVICE_XYZ)))
		return -EINVAL;

	if (attach.flags & VFIO_DEVICE_ATTACH_PASID)
		xend = offsetofend(struct vfio_device_attach_iommufd_pt, pasid);

	if (attach.flags & VFIO_DEVICE_XYZ)
		xend = offsetofend(struct vfio_device_attach_iommufd_pt, xyz);

	if (xend) {
		if (attach.argsz < xend)
			return -EINVAL;  

New userspace can provide argsz = offsetofend(, xyz), just as it could
provide argsz = PAGE_SIZE now if it really wanted, but argsz > minsz is
only required if the user sets any of these new flags.  Therefore old
userspace on new kernel continues to work.

> To make it work, I'm
> considering to make some changes to the code. When argsz < xend, we only
> copy extra data with size==argsz-minsz. Just as the below.
> 
> 	if (xend) {
> 		unsigned long size;
> 
> 		if (attach.argsz < xend)

This is an -EINVAL condition, xend tracks the flags the user has set.
The user must provide a sufficient buffer for the flags they've set.

> 			size = attach.argsz - minsz;
> 		else
> 			size = xend - minsz;

This is the only correct copy size.

> 
> 		if (copy_from_user((void *)&attach + minsz,
> 				  (void __user *)arg + minsz, size))
> 			return -EFAULT;
> 	}
> 
> However, it seems to have another problem. If the userspace that uses
> @pasid set the argsz==offsetofend(struct, pasid) - 1, such userspace is
> not supposed to work and should be failed by kernel. is it? However, my
> above code cannot fail it. :(
> 
> Any suggestion about it?

If a user sets the ATTACH_PASID flag and argsz is less than
offsetofend(struct, pasid), we need to return -EINVAL as indicated
above.  Thanks,

Alex

> 
> >>>>
> >>>>         if (copy_from_user((void *)&attach + minsz,
> >>>>                     (void __user *)arg + minsz, xend - minsz))
> >>>>             return -EFAULT;
> >>>>     }  
> >>>  
>