Re: [PATCH v2 3/3] qom: Link multiple numa nodes to device using a new object

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 17 Oct 2023 10:54:19 -0600

On Tue, 17 Oct 2023 12:28:30 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Oct 17, 2023 at 09:21:16AM -0600, Alex Williamson wrote:
> 
> > Do we therefore need some programatic means for the kernel driver to
> > expose the node configuration to userspace?  What interfaces would
> > libvirt like to see here?  Is there an opportunity that this could
> > begin to define flavors or profiles for variant devices like we have
> > types for mdev devices where the node configuration would be
> > encompassed in a device profile?   
> 
> I don't think we should shift this mess into the kernel..
> 
> We have a wide range of things now that the orchestration must do in
> order to prepare that are fairly device specific. I understand in K8S
> configurations the preference is using operators (aka user space
> drivers) to trigger these things.
> 
> Supplying a few extra qemu command line options seems minor compared
> to all the profile and provisioning work that has to happen for other
> device types.

This seems to be a growing problem for things like mlx5-vfio-pci where
there's non-trivial device configuration necessary to enable migration
support.  It's not super clear to me how those devices are actually
expected to be used in practice with that configuration burden.

Are we simply saying here that it's implicit knowledge that the
orchestration must posses that when assigning devices exactly matching
10de:2342 or 10de:2345 when bound to the nvgrace-gpu-vfio-pci driver
that 8 additional NUMA nodes should be added to the VM and an ACPI
generic initiator object created linking those additional nodes to the
assigned GPU?

Is libvirt ok with that specification or are we simply going to bubble
this up as a user problem? Thanks,

Alex