On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote: > > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote: > > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote: > > > > > > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>: > > > > > > > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote: > > > > >> Hi Greg, > > > > >> > > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue > > > > >> closed that has been perculating for a while around creating a mechanism > > > > >> that will allow kernel drivers like vfio can bind to devices of any type. > > > > >> > > > > >> This thread with you: > > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html > > > > >> ...seems to have died out, so am trying to get your response > > > > >> and will summarize again. Vfio drivers in the kernel (regardless of > > > > >> bus type) need to bind to devices of any type. The driver's function > > > > >> is to simply export hardware resources of any type to user space. > > > > >> > > > > >> There are several approaches that have been proposed: > > > > > > > > > > You seem to have missed the one I proposed. > > > > >> > > > > >> 1. new_id -- (current approach) the user explicitly registers > > > > >> each new device type with the vfio driver using the new_id > > > > >> mechanism. > > > > >> > > > > >> Problem: multiple drivers will be resident that handle the > > > > >> same device type...and there is nothing user space hotplug > > > > >> infrastructure can do to help. > > > > >> > > > > >> 2. "any id" -- the vfio driver could specify a wildcard match > > > > >> of some kind in its ID match table which would allow it to > > > > >> match and bind to any possible device id. However, > > > > >> we don't want the vfio driver grabbing _all_ devices...just the ones we > > > > >> explicitly want to pass to user space. > > > > >> > > > > >> The proposed patch to support this was to create a new flag > > > > >> "sysfs_bind_only" in struct device_driver. When this flag > > > > >> is set, the driver can only bind to devices via the sysfs > > > > >> bind file. This would allow the wildcard match to work. > > > > >> > > > > >> Patch is here: > > > > >> https://lkml.org/lkml/2013/12/3/253 > > > > >> > > > > >> 3. "Driver initiated explicit bind" -- with this approach the > > > > >> vfio driver would create a private 'bind' sysfs object > > > > >> and the user would echo the requested device into it: > > > > >> > > > > >> echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind > > > > >> > > > > >> In order to make that work, the driver would need to call > > > > >> driver_probe_device() and thus we need this patch: > > > > >> https://lkml.org/lkml/2014/2/8/175 > > > > > > > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio driver. > > > > > > > > This is approach 2, no? > > > > > > > > > > > > > > Which I think is what is currently being done. Why is that not sufficient? > > > > > > > > How would 'bind to vfio driver' look like? > > > > > > > > > The only thing I see in the URL is " That works, but it is ugly." > > > > > There is some mention of race but I don't see how - if you do the 'unbind' > > > > > on the original driver and then bind the BDF to the VFIO how would you get > > > > > a race? > > > > > > > > Typically on PCI, you do a > > > > > > > > - add wildcard (pci id) match to vfio driver > > > > - unbind driver > > > > -> reprobe > > > > -> device attaches to vfio driver because it is the least recent match > > > > - remove wildcard match from vfio driver > > > > > > > > If in between you hotplug add a card of the same type, it gets attached to vfio - even though the logical "default driver" would be the device specific driver. > > > > > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're > > > really factoring it into the discussion. drivers_autoprobe allows us to > > > toggle two points: > > > > > > a) When a new device is added whether we automatically give drivers a > > > try at binding to it > > > > > > b) When a new driver is added whether it gets to try to bind to anything > > > in the system > > > > > > So we do have a mechanism to avoid the race, but the problem is that it > > > becomes the responsibility of userspace to: > > > > > > 1) turn off drivers_autoprobe > > > 2) unbind/new_id/bind/remove_id > > > 3) turn on drivers_autoprobe > > > 4) call drivers_probe for anything added between 1) & 3) > > > > > > Is the question about the ugliness of the current solution whether it's > > > unreasonable to ask userspace to do this? > > > > > > What we seem to be asking for above is more like an autoprobe flag per > > > driver where there's some way for this special driver to opt out of auto > > > probing. Option 2. in Stuart's list does this by short-cutting ID > > > matching so that a "match" is only found when using the sysfs bind path, > > > option 3. enables a way for a driver to expose their own sysfs entry > > > point for binding. The latter feels particularly chaotic since drivers > > > get to make-up their own bind mechanism. > > > > > > Another twist I'll throw in is that devices can be hot added to IOMMU > > > groups that are in-use by userspace. When that happens we'd like to be > > > able to disable driver autoprobe of the device to avoid a host driver > > > automatically binding to the device. I wonder if instead of looking at > > > the problem from the driver perspective, if we were to instead look at > > > it from the device perspective if we might find a solution that would > > > address both. For instance, if devices had a driver_probe_id property > > > that was by default set to their bus specific ID match ("$VENDOR > > > $DEVICE" on PCI) could we use that to write new match IDs so that a > > > device could only bind to a given driver? Effectively we could then > > > bind either using the current method of adding to the list of IDs a > > > driver will match of changing the ID that a device would match. Does > > > that get us anywhere? Thanks, > > > > Here's one way this might work for PCI; note that we can do this > > entirely in the bus driver for PCI. Bind/unbind would go like this: > > > > # bind device to vfio-pci > > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe > > > > # bind device back to host driver > > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind > > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe > > > > When preferred_driver is set for a device it will match and bind only to > > a driver with a matching name. This also means we can write random > > strings here to avoid a device being bound to any driver if we want. > > > > In the example patch below I've put the preferred_driver in the struct > > pci_dev, but if this mechanism were adopted by multiple devices perhaps > > we could add it to struct device. Would something like this work for > > platform devices? > > > > Note 1, the below is just the core PCI driver change to support this, > > there's some trivial collateral damage from changing an exported > > function not shown here for brevity. > > > > Note 2, PCI passes a struct pci_device_id to the driver probe function > > which would be NULL in the preferred driver case of the example below. > > We'd need to dynamically create one of these when calling the probe > > function to make this practical for drivers that use that data. Thanks, > > That is I think a much easier way. Thought I would just call > it 'override' instead of preferred_driver, since well, that is its > intent. > > Thank you for prototyping it! I've realized since this first draft that returning NULL for the pci_device_id would be unexpected for a number of drivers and probably cause null pointer dereferences. This is an implementation detail though, we probably want a static "any ID" pci_device_id to return in the case that there are no static table or dynid matches yet we still want the override to match. This should result in a smaller patch. I'll wait for feasibility from the platform folks before I do another revision though. Thanks, Alex > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index d911e0c..9425920 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv); > > * Deprecated, don't use this as it will not catch any dynamic ids > > * that a driver might want to check for. > > */ > > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids, > > - struct pci_dev *dev) > > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev, > > + const struct pci_device_id **id) > > { > > + if (id) > > + *id = NULL; > > + > > if (ids) { > > while (ids->vendor || ids->subvendor || ids->class_mask) { > > - if (pci_match_one_device(ids, dev)) > > - return ids; > > + if (pci_match_one_device(ids, dev)) { > > + if (id) > > + *id = ids; > > + return 1; > > + } > > ids++; > > } > > } > > - return NULL; > > + return 0; > > } > > > > /** > > @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct pci_device_id *ids, > > * system is in its list of supported devices. Returns the matching > > * pci_device_id structure or %NULL if there is no match. > > */ > > -static const struct pci_device_id *pci_match_device(struct pci_driver *drv, > > - struct pci_dev *dev) > > +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev, > > + const struct pci_device_id **id) > > { > > struct pci_dynid *dynid; > > > > + if (id) > > + *id = NULL; > > + > > + if (dev->preferred_driver) > > + return !strcmp(drv->name, dev->preferred_driver); > > + > > /* Look at the dynamic ids first, before the static ones */ > > spin_lock(&drv->dynids.lock); > > list_for_each_entry(dynid, &drv->dynids.list, node) { > > if (pci_match_one_device(&dynid->id, dev)) { > > spin_unlock(&drv->dynids.lock); > > - return &dynid->id; > > + if (id) > > + *id = &dynid->id; > > + return 1; > > } > > } > > spin_unlock(&drv->dynids.lock); > > > > - return pci_match_id(drv->id_table, dev); > > + return pci_match_id(drv->id_table, dev, id); > > } > > > > struct drv_dev_and_id { > > @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev) > > if (!pci_dev->driver && drv->probe) { > > error = -ENODEV; > > > > - id = pci_match_device(drv, pci_dev); > > - if (id) > > + if (pci_match_device(drv, pci_dev, &id)) > > error = pci_call_probe(drv, pci_dev, id); > > if (error >= 0) > > error = 0; > > @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct device_driver *drv) > > { > > struct pci_dev *pci_dev = to_pci_dev(dev); > > struct pci_driver *pci_drv; > > - const struct pci_device_id *found_id; > > > > if (!pci_dev->match_driver) > > return 0; > > > > pci_drv = to_pci_driver(drv); > > - found_id = pci_match_device(pci_drv, pci_dev); > > - if (found_id) > > - return 1; > > - > > - return 0; > > + return pci_match_device(pci_drv, pci_dev, NULL); > > } > > > > /** > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index 4e0acef..d6075f8 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev, > > } > > static DEVICE_ATTR_RW(enabled); > > > > +static ssize_t preferred_driver_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev); > > + char *preferred_driver, *old = pdev->preferred_driver; > > + > > + if (count > PATH_MAX) > > + return -EINVAL; > > + > > + preferred_driver = kstrndup(buf, count, GFP_KERNEL); > > + if (!preferred_driver) > > + return -ENOMEM; > > + > > + while (strlen(preferred_driver) && > > + preferred_driver[strlen(preferred_driver) - 1] == '\n') > > + preferred_driver[strlen(preferred_driver) - 1] = '\0'; > > + > > + if (strlen(preferred_driver)) { > > + pdev->preferred_driver = preferred_driver; > > + } else { > > + kfree(preferred_driver); > > + pdev->preferred_driver = NULL; > > + } > > + > > + if (old) > > + kfree(old); > > + > > + return count; > > +} > > + > > +static ssize_t preferred_driver_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev); > > + > > + return sprintf(buf, "%s\n", pdev->preferred_driver); > > +} > > +static DEVICE_ATTR_RW(preferred_driver); > > + > > #ifdef CONFIG_NUMA > > static ssize_t > > numa_node_show(struct device *dev, struct device_attribute *attr, char *buf) > > @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = { > > #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI) > > &dev_attr_d3cold_allowed.attr, > > #endif > > + &dev_attr_preferred_driver.attr, > > NULL, > > }; > > > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index aab57b4..6fecb0a 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -365,6 +365,7 @@ struct pci_dev { > > #endif > > phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */ > > size_t romlen; /* Length of ROM if it's not from the BAR */ > > + char *preferred_driver; /* Preferred driver, supercedes ID matching */ > > }; > > > > static inline struct pci_dev *pci_physfn(struct pci_dev *dev) > > @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv, > > unsigned int subvendor, unsigned int subdevice, > > unsigned int class, unsigned int class_mask, > > unsigned long driver_data); > > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids, > > - struct pci_dev *dev); > > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev, > > + const struct pci_device_id **id); > > int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, > > int pass); > > > > > > > > _______________________________________________ > > iommu mailing list > > iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx > > https://lists.linuxfoundation.org/mailman/listinfo/iommu _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm