On Tue, 11 Jul 2017 11:46:27 +0200 Greg KH <greg@xxxxxxxxx> wrote: > On Mon, Jul 10, 2017 at 03:34:12PM -0600, Alex Williamson wrote: > > On Mon, 26 Jun 2017 10:08:55 +0100 > > Russell King - ARM Linux <linux@xxxxxxxxxxxxxxx> wrote: > > > > > On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote: > > > > If a device is bound to a non-vfio, non-whitelisted driver while a > > > > group is in use, then the integrity of the group is compromised and > > > > will result in hitting a BUG_ON. This code tries to avoid this case > > > > by mangling driver_override to force a no-match for the driver. The > > > > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred) > > > > or BOUND_DRIVER, at which point we can remove the driver_override > > > > mangling. > > > > > > Rather than mangling the driver override string to prevent driver binding, > > > I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER > > > notifier to fail the device probe? > > > > Well, it seemed like a good idea, but I don't think we're getting any > > traction here, the thread has gone cold: > > > > https://lkml.org/lkml/2017/6/27/1002 > > > > Greg, any further comments? > > I still think your drivers should be fixed, adding > yet-another-odd-interaction with the driver core is ripe for added > complexity... Hi Greg, Let me give a concrete scenario, I have a dual-port conventional PCI e1000 NIC. The IOMMU operates on PCIe requester IDs and therefore both NIC functions are masked behind the requester ID of a PCIe-to-PCI bridge. We cannot have the e1000 driver managing one function and a user managing the other (via vfio-pci). In this case, not only is the DMA not isolated but the functions share the same IOMMU context. Therefore in order to allow the user access to one function via vfio-pci, the other function needs to be in a known state, either also bound to vfio-pci, bound to an innocuous driver like pci-stub, or unbound from any driver. Given this state, user now has access to one function of the device, but how can we fix our driver to manage the other function? If the other function is also bound to vfio-pci, the driver core does not allow us to refuse a driver remove request, the best we can do is block for a while, but we best not do that too long so we end up in the device unbound state. Likewise, if the other function was bound to pci-stub, this driver won't block remove, so the device for the other port can transition to an unbound state. Once in an unbound state, how would fixing either the vfio-pci or the core vfio driver prevent the scenario which can now happen of the unbound device being bound to the host e1000 driver? This can happen in pure PCIe topologies as well where perhaps the IOMMU context is not shared, but the devices still lack DMA isolation within the group. The only tool we currently have to manage this scenario is that the vfio core driver can pull BUG_ON after the fact of the other device being bound to a host driver. Understandably, users aren't so keen on this, which is why I'm trying to allow vfio to block binding of that other device before it happens. None of this really seems to fall within the capabilities of the existing driver core, so simply fixing my driver doesn't seem to be a well defined option. Is there a simple solution I'm missing? We're not concerned only with auto-probing, we need to protect against explicit bind attempts as well. > And, as there's no real patch for me to do anything with (hint, I can't > apply RFC patches), I don't know what I can do here... Certainly continuing the discussion is all I'm asking for at this point. The RFC didn't tickle your fancy, but the reply also didn't convey an appreciation of the circumstances. I hope that perhaps this gets us a step closer so we can decide which way to go. Thanks, Alex