Re: KVM PCI device assignment issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Matthew Wilcox (matthew@xxxxxx) wrote:
> On Fri, Feb 13, 2009 at 04:32:47PM +0000, Mark McLoughlin wrote:
> >   - Conventional PCI devices (i.e. PCI/PCI-X, not PCIe) behind the same 
> >     bridge must be assigned to the same VT-d domain - i.e given device 
> >     A (0000:0f:1.0) and device B (and 0000:0f:2.0), if you assign 
> >     device A to guest, you cannot then use device B in the host or 
> >     another guest.
> 
> Is that a limitation of the VT-d / IOMMU setup?

Yes.  The source id will essentially show up as the bridge.

> >   - Some newer PCIe devices (and newer conventional PCI devices too via 
> >     PCI Advanced Features) support Function Level Reset (FLR). This 
> >     allows a PCI function to be reset without affecting any other 
> >     functions on that device, or any other devices. This feature is not 
> >     widespread yet AFAIK - e.g. I've seen it on an audio controller, 
> >     and it must also be supported by SR-IOV devices.
> 
> Yes, that's definitely not very widespread yet.  OTOH, we don't need to
> worry about disturbing other functions if all devices behind the same
> bridge have to be mapped to the same guest.

FLR (when it exists) would work fine for devices not behind a conventional
pci bridge.

> > Driver Unbinding
> > ================
> > 
> > Before a device is assigned to a guest, we should make sure that no host
> > device driver is currently bound to the device.
> > 
> > We can do that with e.g.
> > 
> >  $> echo -n "8086 10de"  > /sys/bus/pci/drivers/pci-stub/new_id
> >  $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind
> >  $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/pci-stub/bind
> > 
> > One minor problem with this scheme is that at this point you can't
> > unbind from pci-stub and trigger a re-probe and have e1000e bind to it.
> > In order to support that, we need a "remove_id" interface to remove the
> > dynamic ID.
> 
> It sounds like you'd be OK with a 'remove_id' interface that only
> removes subsequently-added interfaces.
> 
> I might suggest a second approach which would be to have an explicit
> echo to the bind file ignore the list of ids.  Then you wouldn't need to
> 'echo -n "8086 10de"' to begin with.

I tried that first, and it dips into the driver logic, where it wants
to filter via ->match.  Untested patch below _should_ be enough to avoid
adding the id to begin with.

> > Furthermore, it should be possible to do this without actually affecting
> > any of the devices - i.e. a "try to unbind and see if we oops" approach
> > clearly isn't great.
> 
> Well, yes.  I'd even be upset if my network or storage flickered away
> briefly while another using was starting to run KVM.
> 
> > This last constraint is the most difficult and points to the logic
> > needing to be in userland management libraries. Possibly the only sane
> > kernel space support would be "try to unbind and reset; if it works then
> > the device is assignable".
> 
> If we expose a 'reset' file in the /sys/bus/pci/devices/*/ directories
> for devices that are resettable, that should be enough, I would think.

Yes, currently it's only internal and it's not robust given the reset
constraints.

thanks,
-chris
--

 drivers/base/base.h |    1 +
 drivers/base/bus.c  |    2 +-
 drivers/base/dd.c   |   15 +++++++++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0a5f055..60dc346 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -86,6 +86,7 @@ extern void bus_remove_driver(struct device_driver *drv);
 
 extern void driver_detach(struct device_driver *drv);
 extern int driver_probe_device(struct device_driver *drv, struct device *dev);
+extern int driver_bind_probe_device(struct device_driver *drv, struct device *dev);
 
 extern void sysdev_shutdown(void);
 extern int sysdev_suspend(pm_message_t state);
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 83f32b8..ad28338 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -202,7 +202,7 @@ static ssize_t driver_bind(struct device_driver *drv,
 		if (dev->parent)	/* Needed for USB */
 			down(&dev->parent->sem);
 		down(&dev->sem);
-		err = driver_probe_device(drv, dev);
+		err = driver_bind_probe_device(drv, dev);
 		up(&dev->sem);
 		if (dev->parent)
 			up(&dev->parent->sem);
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 315bed8..fba6463 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -184,13 +184,14 @@ int driver_probe_done(void)
  * This function must be called with @dev->sem held.  When called for a
  * USB interface, @dev->parent->sem must be held as well.
  */
-int driver_probe_device(struct device_driver *drv, struct device *dev)
+static int __driver_probe_device(struct device_driver *drv, struct device *dev,
+				 bool force)
 {
 	int ret = 0;
 
 	if (!device_is_registered(dev))
 		return -ENODEV;
-	if (drv->bus->match && !drv->bus->match(dev, drv))
+	if (!force && drv->bus->match && !drv->bus->match(dev, drv))
 		goto done;
 
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
@@ -202,6 +203,16 @@ done:
 	return ret;
 }
 
+int driver_probe_bind_device(struct device_driver *drv, struct device *dev)
+{
+	return __driver_probe_device(drv, dev, 1);
+}
+
+int driver_probe_device(struct device_driver *drv, struct device *dev)
+{
+	return __driver_probe_device(drv, dev, 0);
+}
+
 static int __device_attach(struct device_driver *drv, void *data)
 {
 	struct device *dev = data;
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux