Re: [RFC PATCH] vfio: VFIO Driver core framework

Scott Wood <scottwood@xxxxxxxxxxxxx> · Fri, 18 Nov 2011 15:09:52 -0600

On Fri, Nov 18, 2011 at 01:32:56PM -0700, Alex Williamson wrote:
> Hmm, that might be cleaner than eliminating the size with just using
> _IO().  So we might have something like:
> 
> #define VFIO_IOMMU_MAP_DMA              _IOWR(';', 106, struct vfio_dma_map)
> #define VFIO_IOMMU_MAP_DMA_V2           _IOWR(';', 106, struct vfio_dma_map_v2)
> 
> For which the driver might do:
> 
> case VFIO_IOMMU_MAP_DMA:
> case VFIO_IOMMU_MAP_DMA_V2:
> {
> 	struct vfio_dma_map map;
> 
> 	/* We don't care about the extra v2 bits */
> 	if (copy_from_user(&map, (void __user *)arg, sizeof map))
> 		return -EFAULT;

That won't work if you have an old kernel that doesn't know about v2, and
a new user that uses v2.  To make this work you'd have to strip out the
size from the ioctl number before switching (but still use it when
considering whether to access the v2 fields).  Simpler to just leave it
out of the ioctl number and put it in the struct field as currently
planned.

> > > I think all our structure sizes are independent of host width.  If I'm
> > > missing something, let me know.
> > 
> > Ah, for structures, that might be true.  I was seeing the bunch of
> > ioctl()s that take ints.
> 
> Ugh, I suppose you're thinking of an ILP64 platform with ILP32 compat
> mode.

Does Linux support ILP64?  There are "int" ioctls all over the place, and
I don't think we do compat wrappers for them.  In fact, some of the
ioctls in linux/fs.h use "int" for the compatible version of ioctls
originally defined as "long".

It's cleaner to always use the fixed types, though.

> Darn it, guess we need to make everything 64bit, including file
> descriptors.

What's wrong with __u32/__s32 (or uint32_t/int32_t)?

I really do not see Linux supporting an ABI that has no 32-bit type at
all, especially in a situation where userspace compatibility is needed. 
If that does happen, the ABI breakage will go well beyond VFIO.

> The point of the group is to provide a unit of ownership.  We can't let
> $userA open $groupid and fetch a device, then have $userB do the same,
> grabbing a different device.  The mappings will step on each other and
> the devices have no isolation.  We can't restrict that purely by file
> permissions or we'll have the same problem with sudo.

What is the problem with sudo?  If you're running processes as the same
user, regardless of how, they're going to be able to mess with each
other.

Is it possible to expose access to only specific groups via an
individually-permissionable /dev/device, so only the entity handing out
access to devices needs access to everything?

> At one point we discussed a single open instance, but that
> unnecessarily limits the user, so we settled on the mm.  Thanks,

It would be nice if this limitation weren't excessively integrated into
the design -- in the embedded space we've got unusual partitioning
setups, including failover arrangements where partitions share devices. 
The device may be configured with the IOMMU pointing only at regions that
are shared by both mms, or the non-shared regions may be reconfigured as
active ownership of the device gets handed around.

It would be up to userspace code to make sure that the mappings don't
"step on each other".  The mapping could be done with whichever mm issued
the map call for a given region.

For this use case, there is unlikely to be an issue with ownership
because there will not be separate privilege domains creating partitions
-- other use cases could refrain from enabling multiple-mm support unless
ownership issues are resolved.

This doesn't need to be supported initially, but we should try to avoid
letting the assumption permeate the code.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html