On 2/2/2021 10:44 PM, Jason Gunthorpe wrote:
On Tue, Feb 02, 2021 at 12:37:23PM -0700, Alex Williamson wrote:
For the most part, this explicit bind interface is redundant to
driver_override, which already avoids the duplicate ID issue.
No, the point here is to have the ID tables in the PCI drivers because
they fundamentally only work with their supported IDs. The normal
driver core ID tables are a replacement for all the hardwired if's in
vfio_pci.
driver_override completely disables all the ID checking, it seems only
useful for vfio_pci which works with everything. It should not be used
with something like nvlink_vfio_pci.ko that needs ID checking.
This mechanism of driver_override seems weird to me. In case of hotplug
and both capable drivers (native device driver and vfio-pci) are loaded,
both will compete on the device.
I think the proposed flags is very powerful and it does fix the original
concern Alex had ("if we start adding ids for vfio drivers then we
create conflicts with the native host driver") and it's very deterministic.
In this way we'll bind explicitly to a driver.
And the way we'll choose a vfio-pci driver is by device_id + vendor_id +
subsystem_device + subsystem_vendor.
There shouldn't be 2 vfio-pci drivers that support a device with same
above 4 ids.
if you don't find a suitable vendor-vfio-pci.ko, you'll try binding
vfio-pci.ko.
Each driver will publish its supported ids in sysfs to help the user to
decide.
Yes, this DRIVER_EXPLICIT_BIND_ONLY idea somewhat replaces
driver_override because we could set the PCI any match on vfio_pci and
manage the driver binding explicitly instead.
A driver id table doesn't really help for binding the device,
ultimately even if a device is in the id table it might fail to
probe due to the missing platform support that each of these igd and
nvlink drivers expose,
What happens depends on what makes sense for the driver, some missing
optional support could continue without it, or it could fail.
IGD and nvlink can trivially go onwards and work if they don't find
the platform support.
Or they might want to fail, I think the mlx5 and probably nvlink
drivers should fail as they are intended to be coupled with userspace
that expects to use their extended features.
In those cases failing is a feature because it prevents the whole
system from going into an unexpected state.
Jason