Hi Greg, Jason, > -----Original Message----- > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > On Fri, 8 Nov 2019 17:05:45 -0400 > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > On Fri, Nov 08, 2019 at 01:34:35PM -0700, Alex Williamson wrote: > > > On Fri, 8 Nov 2019 16:12:53 -0400 > > > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > > On Fri, Nov 08, 2019 at 11:12:38AM -0800, Jakub Kicinski wrote: > > > > > On Fri, 8 Nov 2019 15:40:22 +0000, Parav Pandit wrote: > > > > > > > The new intel driver has been having a very similar > > > > > > > discussion about how to model their 'multi function device' > > > > > > > ie to bind RDMA and other drivers to a shared PCI function, and I > think that discussion settled on adding a new bus? > > > > > > > > > > > > > > Really these things are all very similar, it would be nice > > > > > > > to have a clear methodology on how to use the device core if > > > > > > > a single PCI device is split by software into multiple > > > > > > > different functional units and attached to different driver instances. > > > > > > > > > > > > > > Currently there is alot of hacking in this area.. And a > > > > > > > consistent scheme might resolve the ugliness with the dma_ops > wrappers. > > > > > > > > > > > > > > We already have the 'mfd' stuff to support splitting > > > > > > > platform devices, maybe we need to create a 'pci-mfd' to support > splitting PCI devices? > > > > > > > > > > > > > > I'm not really clear how mfd and mdev relate, I always > > > > > > > thought mdev was strongly linked to vfio. > > > > > > > > > > > > > > > > > > > Mdev at beginning was strongly linked to vfio, but as I > > > > > > mentioned above it is addressing more use case. > > > > > > > > > > > > I observed that discussion, but was not sure of extending mdev > further. > > > > > > > > > > > > One way to do for Intel drivers to do is after series [9]. > > > > > > Where PCI driver says, MDEV_CLASS_ID_I40_FOO > > > > > > RDMA driver mdev_register_driver(), matches on it and does the > probe(). > > > > > > > > > > Yup, FWIW to me the benefit of reusing mdevs for the Intel case vs > > > > > muddying the purpose of mdevs is not a clear trade off. > > > > > > > > IMHO, mdev has amdev_parent_ops structure clearly intended to link > > > > it to vfio, so using a mdev for something not related to vfio > > > > seems like a poor choice. > > > > > > Unless there's some opposition, I'm intended to queue this for v5.5: > > > > > > https://www.spinics.net/lists/kvm/msg199613.html > > > > > > mdev has started out as tied to vfio, but at it's core, it's just a > > > device life cycle infrastructure with callbacks between bus drivers > > > and vendor devices. If virtio is on the wrong path with the above > > > series, please speak up. Thanks, > > > > Well, I think Greg just objected pretty strongly. > > > > IMHO it is wrong to turn mdev into some API multiplexor. That is what > > the driver core already does and AFAIK your bus type is supposed to > > represent your API contract to your drivers. > > > > Since the bus type is ABI, 'mdev' is really all about vfio I guess? > > > > Maybe mdev should grow by factoring the special GUID life cycle stuff > > into a helper library that can make it simpler to build proper API > > specific bus's using that lifecycle model? ie the virtio I saw > > proposed should probably be a mdev-virtio bus type providing this new > > virtio API contract using a 'struct mdev_virtio'? > > I see, the bus:API contract is more clear when we're talking about physical > buses and physical devices following a hardware specification. > But if we take PCI for example, each PCI device has it's own internal API that > operates on the bus API. PCI bus drivers match devices based on vendor and > device ID, which defines that internal API, not the bus API. The bus API is pretty > thin when we're talking virtual devices and virtual buses though. The bus "API" > is essentially that lifecycle management, so I'm having a bit of a hard time > differentiating this from saying "hey, that PCI bus is nice, but we can't have > drivers using their own API on the same bus, so can we move the config space, > reset, hotplug, etc, stuff into helpers and come up with an (ex.) mlx5_bus > instead?" Essentially for virtual devices, we're dictating a bus per device type, > whereas it seemed like a reasonable idea at the time to create a common > virtual device bus, but maybe it went into the weeds when trying to figure out > how device drivers match to devices on that bus and actually interact with > them. > > > I only looked briefly but mdev seems like an unusual way to use the > > driver core. *generally* I would expect that if a driver wants to > > provide a foo_device (on a foo bus, providing the foo API contract) it > > looks very broadly like: > > > > struct foo_device { > > struct device dev; > > const struct foo_ops *ops; > > }; > > struct my_foo_device { > > struct foo_device fdev; > > }; > > > > foo_device_register(&mydev->fdev); > > If I understood Greg's direction on using bus and Jason's suggestion of 'mdev-virtio' example, User has one of the three use cases as I described in cover letter. i.e. create a sub device and configure it. once its configured, Based on the use case, map it to right bus driver. 1. mdev-vfio (no demux business) 2. virtio (new bus) 3. mlx5_bus (new bus) We should be creating 3 different buses, instead of mdev bus being de-multiplexer of that? Hence, depending the device flavour specified, create such device on right bus? For example, $ devlink create subdev pci/0000:05:00.0 flavour virtio name foo subdev_id 1 $ devlink create subdev pci/0000:05:00.0 flavour mdev <uuid> subdev_id 2 $ devlink create subdev pci/0000:05:00.0 flavour mlx5 id 1 subdev_id 3 $ devlink subdev pci/0000:05:00.0/<subdev_id> config <params> $ echo <respective_device_id> <sysfs_path>/bind Implement power management callbacks also on all above 3 buses? Abstract out mlx5_bus into more generic virtual bus (vdev bus?) so that multiple vendors can reuse?