Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 22 Nov 2011 12:16:32 -0700

On Fri, Nov 18, 2011 at 2:09 PM, Scott Wood <scottwood@xxxxxxxxxxxxx> wrote:
> On Fri, Nov 18, 2011 at 01:32:56PM -0700, Alex Williamson wrote:
>> Hmm, that might be cleaner than eliminating the size with just using
>> _IO().  So we might have something like:
>>
>> #define VFIO_IOMMU_MAP_DMA              _IOWR(';', 106, struct vfio_dma_map)
>> #define VFIO_IOMMU_MAP_DMA_V2           _IOWR(';', 106, struct vfio_dma_map_v2)
>>
>> For which the driver might do:
>>
>> case VFIO_IOMMU_MAP_DMA:
>> case VFIO_IOMMU_MAP_DMA_V2:
>> {
>>       struct vfio_dma_map map;
>>
>>       /* We don't care about the extra v2 bits */
>>       if (copy_from_user(&map, (void __user *)arg, sizeof map))
>>               return -EFAULT;
>
> That won't work if you have an old kernel that doesn't know about v2, and
> a new user that uses v2.  To make this work you'd have to strip out the
> size from the ioctl number before switching (but still use it when
> considering whether to access the v2 fields).  Simpler to just leave it
> out of the ioctl number and put it in the struct field as currently
> planned.

Ok, _IO for all ioctls passing structs then.

>> > > I think all our structure sizes are independent of host width.  If I'm
>> > > missing something, let me know.
>> >
>> > Ah, for structures, that might be true.  I was seeing the bunch of
>> > ioctl()s that take ints.
>>
>> Ugh, I suppose you're thinking of an ILP64 platform with ILP32 compat
>> mode.
>
> Does Linux support ILP64?  There are "int" ioctls all over the place, and
> I don't think we do compat wrappers for them.  In fact, some of the
> ioctls in linux/fs.h use "int" for the compatible version of ioctls
> originally defined as "long".
>
> It's cleaner to always use the fixed types, though.

I've updated anything that passes data to use a structure and will
make use of __s32 in place of ints.  If there ever exists an ILP64
system, we can use a flag bit of the structure to indicate 64bit file
descriptor support.

>> Darn it, guess we need to make everything 64bit, including file
>> descriptors.
>
> What's wrong with __u32/__s32 (or uint32_t/int32_t)?
>
> I really do not see Linux supporting an ABI that has no 32-bit type at
> all, especially in a situation where userspace compatibility is needed.
> If that does happen, the ABI breakage will go well beyond VFIO.

Yep, I think the structs fix this and still leave room for the impossible.

>> The point of the group is to provide a unit of ownership.  We can't let
>> $userA open $groupid and fetch a device, then have $userB do the same,
>> grabbing a different device.  The mappings will step on each other and
>> the devices have no isolation.  We can't restrict that purely by file
>> permissions or we'll have the same problem with sudo.
>
> What is the problem with sudo?  If you're running processes as the same
> user, regardless of how, they're going to be able to mess with each
> other.

Just trying to indicate that file permissions are easy to bypass and
privileged users can inadvertently do stupid stuff.  Kind of like
request_region() in the kernel.   Kernel drivers are privileged, but
we still want to enforce an owner of that region.  VFIO extends the
ownership of a device to a single entity in userspace.  How do we
identify that entity and keep others out?

> Is it possible to expose access to only specific groups via an
> individually-permissionable /dev/device, so only the entity handing out
> access to devices needs access to everything?

Yes, that's fundamental to vfio.  vfio-bus drivers enumerate devices
to the vfio-core.  Privileged users bind devices to the vfio-bus
driver creating viable groups.  Groups are represented as chardevs
under /dev/vfio.  If a user has permission to access the chardev, they
have the ability to use the devices.  Once they get a device or iommu
descriptor the group is tied to them via the struct mm and only they
are permitted to access the other devices in the group.

>> At one point we discussed a single open instance, but that
>> unnecessarily limits the user, so we settled on the mm.  Thanks,
>
> It would be nice if this limitation weren't excessively integrated into
> the design -- in the embedded space we've got unusual partitioning
> setups, including failover arrangements where partitions share devices.
> The device may be configured with the IOMMU pointing only at regions that
> are shared by both mms, or the non-shared regions may be reconfigured as
> active ownership of the device gets handed around.
>
> It would be up to userspace code to make sure that the mappings don't
> "step on each other".  The mapping could be done with whichever mm issued
> the map call for a given region.
>
> For this use case, there is unlikely to be an issue with ownership
> because there will not be separate privilege domains creating partitions
> -- other use cases could refrain from enabling multiple-mm support unless
> ownership issues are resolved.
>
> This doesn't need to be supported initially, but we should try to avoid
> letting the assumption permeate the code.

So I'm hearing "we want to use this driver you're developing that's
centered around using the iommu to securely provide access to a device
from userspace, but can we do it without the iommu and can we loosen
up the security a bit?"  Is that about right?  ;)  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html