Re: kvm PCI assignment & VFIO ramblings

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 23 Aug 2011 11:08:29 -0600

On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote:
> On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote:
> > On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:
> 
> > > I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> > > assigned to a guest, there can also be an ioctl to bind a group to an
> > > address-space of another group (certainly needs some care to not allow
> > > that both groups belong to different processes).
> > 
> > That's an interesting idea.  Maybe an interface similar to the current
> > uiommu interface, where you open() the 2nd group fd and pass the fd via
> > ioctl to the primary group.  IOMMUs that don't support this would fail
> > the attach device callback, which would fail the ioctl to bind them.  It
> > will need to be designed so any group can be removed from the super-set
> > and the remaining group(s) still works.  This feels like something that
> > can be added after we get an initial implementation.
> 
> Handling it through fds is a good idea. This makes sure that everything
> belongs to one process. I am not really sure yet if we go the way to
> just bind plain groups together or if we create meta-groups. The
> meta-groups thing seems somewhat cleaner, though.

I'm leaning towards binding because we need to make it dynamic, but I
don't really have a good picture of the lifecycle of a meta-group.

> > > Btw, a problem we havn't talked about yet entirely is
> > > driver-deassignment. User space can decide to de-assign the device from
> > > vfio while a fd is open on it. With PCI there is no way to let this fail
> > > (the .release function returns void last time i checked). Is this a
> > > problem, and yes, how we handle that?
> > 
> > The current vfio has the same problem, we can't unbind a device from
> > vfio while it's attached to a guest.  I think we'd use the same solution
> > too; send out a netlink packet for a device removal and have the .remove
> > call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
> > and SIGBUS the PIDs holding the device if they don't return it
> > willingly.  Thanks,
> 
> Putting the process to sleep (which would be uninterruptible) seems bad.
> The process would sleep until the guest releases the device-group, which
> can take days or months.
> The best thing (and the most intrusive :-) ) is to change PCI core to
> allow unbindings to fail, I think. But this probably further complicates
> the way to upstream VFIO...

Yes, it's not ideal but I think it's sufficient for now and if we later
get support for returning an error from release, we can set a timeout
after notifying the user to make use of that.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html