* Matthew Wilcox (matthew@xxxxxx) wrote: > On Fri, Feb 13, 2009 at 04:32:47PM +0000, Mark McLoughlin wrote: > > - Conventional PCI devices (i.e. PCI/PCI-X, not PCIe) behind the same > > bridge must be assigned to the same VT-d domain - i.e given device > > A (0000:0f:1.0) and device B (and 0000:0f:2.0), if you assign > > device A to guest, you cannot then use device B in the host or > > another guest. > > Is that a limitation of the VT-d / IOMMU setup? Yes. The source id will essentially show up as the bridge. > > - Some newer PCIe devices (and newer conventional PCI devices too via > > PCI Advanced Features) support Function Level Reset (FLR). This > > allows a PCI function to be reset without affecting any other > > functions on that device, or any other devices. This feature is not > > widespread yet AFAIK - e.g. I've seen it on an audio controller, > > and it must also be supported by SR-IOV devices. > > Yes, that's definitely not very widespread yet. OTOH, we don't need to > worry about disturbing other functions if all devices behind the same > bridge have to be mapped to the same guest. FLR (when it exists) would work fine for devices not behind a conventional pci bridge. > > Driver Unbinding > > ================ > > > > Before a device is assigned to a guest, we should make sure that no host > > device driver is currently bound to the device. > > > > We can do that with e.g. > > > > $> echo -n "8086 10de" > /sys/bus/pci/drivers/pci-stub/new_id > > $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbind > > $> echo -n 0000:00:19.0 > /sys/bus/pci/drivers/pci-stub/bind > > > > One minor problem with this scheme is that at this point you can't > > unbind from pci-stub and trigger a re-probe and have e1000e bind to it. > > In order to support that, we need a "remove_id" interface to remove the > > dynamic ID. > > It sounds like you'd be OK with a 'remove_id' interface that only > removes subsequently-added interfaces. > > I might suggest a second approach which would be to have an explicit > echo to the bind file ignore the list of ids. Then you wouldn't need to > 'echo -n "8086 10de"' to begin with. I tried that first, and it dips into the driver logic, where it wants to filter via ->match. Untested patch below _should_ be enough to avoid adding the id to begin with. > > Furthermore, it should be possible to do this without actually affecting > > any of the devices - i.e. a "try to unbind and see if we oops" approach > > clearly isn't great. > > Well, yes. I'd even be upset if my network or storage flickered away > briefly while another using was starting to run KVM. > > > This last constraint is the most difficult and points to the logic > > needing to be in userland management libraries. Possibly the only sane > > kernel space support would be "try to unbind and reset; if it works then > > the device is assignable". > > If we expose a 'reset' file in the /sys/bus/pci/devices/*/ directories > for devices that are resettable, that should be enough, I would think. Yes, currently it's only internal and it's not robust given the reset constraints. thanks, -chris -- drivers/base/base.h | 1 + drivers/base/bus.c | 2 +- drivers/base/dd.c | 15 +++++++++++++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/base/base.h b/drivers/base/base.h index 0a5f055..60dc346 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -86,6 +86,7 @@ extern void bus_remove_driver(struct device_driver *drv); extern void driver_detach(struct device_driver *drv); extern int driver_probe_device(struct device_driver *drv, struct device *dev); +extern int driver_bind_probe_device(struct device_driver *drv, struct device *dev); extern void sysdev_shutdown(void); extern int sysdev_suspend(pm_message_t state); diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 83f32b8..ad28338 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -202,7 +202,7 @@ static ssize_t driver_bind(struct device_driver *drv, if (dev->parent) /* Needed for USB */ down(&dev->parent->sem); down(&dev->sem); - err = driver_probe_device(drv, dev); + err = driver_bind_probe_device(drv, dev); up(&dev->sem); if (dev->parent) up(&dev->parent->sem); diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 315bed8..fba6463 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -184,13 +184,14 @@ int driver_probe_done(void) * This function must be called with @dev->sem held. When called for a * USB interface, @dev->parent->sem must be held as well. */ -int driver_probe_device(struct device_driver *drv, struct device *dev) +static int __driver_probe_device(struct device_driver *drv, struct device *dev, + bool force) { int ret = 0; if (!device_is_registered(dev)) return -ENODEV; - if (drv->bus->match && !drv->bus->match(dev, drv)) + if (!force && drv->bus->match && !drv->bus->match(dev, drv)) goto done; pr_debug("bus: '%s': %s: matched device %s with driver %s\n", @@ -202,6 +203,16 @@ done: return ret; } +int driver_probe_bind_device(struct device_driver *drv, struct device *dev) +{ + return __driver_probe_device(drv, dev, 1); +} + +int driver_probe_device(struct device_driver *drv, struct device *dev) +{ + return __driver_probe_device(drv, dev, 0); +} + static int __device_attach(struct device_driver *drv, void *data) { struct device *dev = data; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html