Hi Greg, On Thu, 13 Jul 2017 10:23:14 +0200 Greg KH <greg@xxxxxxxxx> wrote: > On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote: > > Let me give a concrete scenario, I have a dual-port conventional PCI > > e1000 NIC. The IOMMU operates on PCIe requester IDs and therefore both > > NIC functions are masked behind the requester ID of a PCIe-to-PCI > > bridge. We cannot have the e1000 driver managing one function and a > > user managing the other (via vfio-pci). > > Agreed, but really, if a user asks to do such a thing, they deserve the > pieces the kernel ends up in, right? I think that's asking a fair bit from users to understand these nuances; at one point in time they can bind the device to e1000 and it works, at another point in time the same operation crashes the system. Perhaps the user is not even using manual binding, maybe the e1000 driver is freshly loaded and probes any devices that are not bound. Maybe the device is passed through /sys/bus/pci/drivers_probe. If the user attempts to bind a device to the wrong driver we don't intentionally run the system into the ground to make that work. Of course if the user is overriding a match with dynamic IDs or driver_override, then I fully agree, the user should be responsible for the action they've dictated. I tend to think of this more towards the former than the latter. > > In this case, not only is the > > DMA not isolated but the functions share the same IOMMU context. > > Therefore in order to allow the user access to one function via > > vfio-pci, the other function needs to be in a known state, either also > > bound to vfio-pci, bound to an innocuous driver like pci-stub, or > > unbound from any driver. Given this state, user now has access to one > > function of the device, but how can we fix our driver to manage the > > other function? > > We have USB drivers that do this all the time, due to crazy USB specs > that required it. The cdc-acm driver is one example, and I think there > are a number of USB sound devices that also have this issue. Just have > the driver of the "first" device grab the second one as well and "know" > about the resources involved, as you are doing today. > > But, then you somehow seem to have the requirement to prevent userspace > from mucking around in your driver bindings, and really, you shouldn't > care about this, because again, if it messes up here, all bets are off. > > > If the other function is also bound to vfio-pci, the driver core does > > not allow us to refuse a driver remove request, the best we can do is > > block for a while, but we best not do that too long so we end up in the > > device unbound state. > > > > Likewise, if the other function was bound to pci-stub, this driver won't > > block remove, so the device for the other port can transition to an > > unbound state. > > > > Once in an unbound state, how would fixing either the vfio-pci or the > > core vfio driver prevent the scenario which can now happen of the > > unbound device being bound to the host e1000 driver? This can happen > > in pure PCIe topologies as well where perhaps the IOMMU context is not > > shared, but the devices still lack DMA isolation within the group. > > > > The only tool we currently have to manage this scenario is that the > > vfio core driver can pull BUG_ON after the fact of the other device > > being bound to a host driver. > > Well, how about just locking the device down, don't crash the kernel. > That at least gives userspace a chance to figure out what they did was > wrong. Locking down the device is an interesting strategy, I'll look into this more. I think we'd be locking down the user/vfio owned device, locking down the device for non-vfio use is essentially what I've been trying to do with blocking driver binding. With PCI devices we have the bus-master bit that we could clear and prevent the user from changing to theoretically stop all DMA for the device. We'd also need to destroy the IOMMU domain or else the IOMMU driver is going to break because the wrong domain type is setup for the host owned device. However, if we manage to do all of this perfectly, the result would be that the user owned device quietly stops working, perhaps only a dmesg log entry to identify what happened. I can imagine many bugs filed and much time wasted looking for that magic breadcrumb. Perhaps the better approach is to try to request the conflicting device(s) back from the user, and failing that kill the user process, ultimately falling back to the existing BUG_ON if we haven't resolved the compromising scenario. A harsher approach, but there's certainly the argument that we're trying to follow the user command of binding to the native host driver. > > Understandably, users aren't so keen on > > this, which is why I'm trying to allow vfio to block binding of that > > other device before it happens. None of this really seems to fall > > within the capabilities of the existing driver core, so simply fixing > > my driver doesn't seem to be a well defined option. Is there a simple > > solution I'm missing? We're not concerned only with auto-probing, we > > need to protect against explicit bind attempts as well. > > Again, if userspace does an explicit bind/unbind attempt, it _HAS_ to > know what it is doing, we can't protect ourselves from that, that's > always been the case. The bind/unbind was done as a way for people to > say "I know more about what is going on than the kernel does right now, > so I'm going to override it." and we have to trust that they do know > that. Don't spend a lot of time and energy trying to protect yourself > from that please. Thanks for the advice, I appreciate your time on this. I'd still like to improve the current behavior, but I'll see what I can do on the side of imposing user consequences before taking the head shot. Thanks, Alex