On Tue, Jan 19, 2021 at 07:56:10PM +0100, Cornelia Huck wrote: > On Mon, 18 Jan 2021 14:16:26 -0400 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Mon, Jan 18, 2021 at 05:00:09PM +0100, Cornelia Huck wrote: > > > > > > You can say that all the HW specific things are in the mlx5_vfio_pci > > > > driver. It is an unusual driver because it must bind to both the PCI > > > > VF with a pci_driver and to the mlx5_core PF using an > > > > auxiliary_driver. This is needed for the object lifetimes to be > > > > correct. > > > > > > Hm... I might be confused about the usage of the term 'driver' here. > > > IIUC, there are two drivers, one on the pci bus and one on the > > > auxiliary bus. Is the 'driver' you're talking about here more the > > > module you load (and not a driver in the driver core sense?) > > > > Here "driver" would be the common term meaning the code that realizes > > a subsytem for HW - so mlx5_vfio_pci is a VFIO driver because it > > ultimately creates a /dev/vfio* through the vfio subsystem. > > > > The same way we usually call something like mlx5_en an "ethernet > > driver" not just a "pci driver" > > > > > Yes, sure. But it also shows that mlx5_vfio_pci aka the device-specific > > > code is rather small in comparison to the common vfio-pci code. > > > Therefore my question whether it will gain more specific changes (that > > > cannot be covered via the auxiliary driver.) > > > > I'm not sure what you mean "via the auxiliary driver" - there is only > > one mlx5_vfio_pci, and the non-RFC version with all the migration code > > is fairly big. > > > > The pci_driver contributes a 'struct pci_device *' and the > > auxiliary_driver contributes a 'struct mlx5_core_dev *'. mlx5_vfio_pci > > fuses them together into a VFIO device. Depending on the VFIO > > callback, it may use an API from the pci_device or from the > > mlx5_core_dev device, or both. > > Let's rephrase my question a bit: > > This proposal splits the existing vfio-pci driver into a "core" > component and code actually implementing the "driver" part. For mlx5, > an alternative "driver" is introduced that reuses the "core" component > and also hooks into mlx5-specific code parts via the auxiliary device > framework. Yes, I think you understand it well > (IIUC, the plan is to make existing special cases for devices follow > mlx5's lead later.) Well, it is a direction to go. I think adding 'if pci matches XX then squeeze in driver Y' to vfio-pci was a hacky thing to do, this is a way out. We could just add 'if pci matches mlx5 then squeeze in driver mlx5' too - but that is really too horific to seriously consider. > I've been thinking of an alternative split: Keep vfio-pci as it is now, > but add an auxiliary device. vfio-pci cannot use auxiliary device. It is only for connecting parts of the same driver together. vfio-pci has no other parts to connect. Further, that is not how the driver core in Linux is designed to work. We don't have subsytems provide a pci_driver and then go look around for a 2nd driver to somehow mix in. How would it know what driver to pick? How would drivers be autoloaded? How can the user know things worked out right? What if they didn't want that? So many questions. The standard driver core flow is always pci_driver -> subsystem -> userspace I don't think VFIO's needs are special, so why deviate? > I guess my question is: into which callbacks will the additional > functionality hook? If there's no good way to do what they need to do > without manipulating the vfio-pci calls, my proposal will not work, and > this proposal looks like the better way. But it's hard to tell without > seeing the code, which is why I'm asking :) Well, could we have done the existing special devices we have today with that approach? What about the Intel thing I saw RFC'd a while ago? Or the next set of mlx5 devices beyond storage? Or an future SIOV device? If you have doubts the idea is flexible enough, then I think you already answered the question :) LWN had a good article on these design patterns. This RFC is following what LWN called the "tool box" pattern. We keep the driver and subsystem close together and provide useful tools to share common code. vfio_pci_core's stuff is a tool. It is a proven and flexable pattern. I think you are suggesting what LWN called a "midlayer mistake" pattern. While that can be workable, it doesn't seem right here. vfio-pci is not really a logical midlayer, and something acting as a midlayer should never have a device_driver.. To quote lwn: The core thesis of the "midlayer mistake" is that midlayers are bad and should not exist. That common functionality which it is so tempting to put in a midlayer should instead be provided as library routines which can used, augmented, or ignored by each bottom level driver independently. Thus every subsystem that supports multiple implementations (or drivers) should provide a very thin top layer which calls directly into the bottom layer drivers, and a rich library of support code that eases the implementation of those drivers. This library is available to, but not forced upon, those drivers. Given the exciting future of VFIO I belive strongly the above is the right general design direction. Jason