On 7/18/19 12:29 PM, Laine Stump wrote:
On 7/18/19 10:29 AM, Daniel Henrique
Barboza wrote:
Hi,
I have a PoC that enables partial coldplug assignment of
multifunction
PCI devices with managed mode. At this moment, Libvirt can't
handle
this scenario - the code will detach only the hostdevs from
the XML,
when in fact the whole IOMMU needs to be detached. This can be
verified by the fact that Libvirt handles the unmanaged
scenario
well, as long as the user detaches the whole IOMMU beforehand.
I have played with 2 approaches. The one I am planning to
contribute
back is a change inside virHostdevGetPCIHostDeviceList(), that
adds the extra PCI devices for detach/re-attach in case a PCI
Multifunction device in managed mode is presented in the XML.
If you're thinking of doing that automatically,
then I should warn you that we had discussed that a long time
ago, and decided that it was a bad idea to do it because it
was likely someone would, e.g. try to assign an audio device
to their guest that happened to be one function on a
multifunction device that also contained a disk controller (or
some other device) that the host needed for proper operation.
Let's say that I have a Multi PCI card with 4 functions, and I want
a guest to use
only the function 0 of that card. At this moment, I'm only able to
do that if I
manually execute nodedev-detach on all 4 functions beforehand and
use function
0 as a hostdev with managed=false.
What I've implemented is a way of doing the detach/re-attach of the
whole IOMMU
for the user, if the hostdev is set with managed=true (and perhaps I
should also
consider verifying the 'multifunction=yes' attribute as well, for
more clarity).
I am not trying to assign all the IOMMU devices to the guest - not
sure if that's
what you were talking about up there, but I'm happy to emphasize
that's not
the case.
Now, yes, if the user is unaware of the consequences of detaching
all devices
of the IOMMU from the host, bad things can happen. If that's what
you're saying,
fair enough. I can make an argument about how we can't shield the
user from
his/her own 'unawareness' forever, but in the end it's better to be
on the safe
side.
It may be that in *your* particular case, you
understand that the functions you don't want to assign to the
guest are not otherwise used, and it's not dangerous to
suddenly detach them from their host driver. But you can't
assume that will always be the case.
If you *really* can't accept just assigning all
the devices in that IOMMU group to the guest (thus making them
all explicitly listed in the config, and obvious to the
administrator that they won't be available on the host) and
simply not using them, then you either need to separately
detach those particular functions from the host, or come up
with a way of having the domain config explicitly list them as
"detached from the host but not actually attached to the
guest".
I can live with that - it will automate the detach/re-attach
process, which is
my goal here, and it force the user to know exactly what is going to
be detached
from the host, minimizing errors. If no one is against adding an
extra
parameter 'unassigned=true' to the hostdev in these cases, I can
make this
happen.
Thanks,
DHB
Now, there's a catch. Inside both virHostdevPreparePCIDevices()
and virHostdevReAttachPCIDevices() there are code to
save/restore
the network configuration for SR-IOV devices. These
functions iterates
in the hostdevs list, instead of the pcidevs list I'm
changing. The final
result, given that the current conditions used for SR-IOV
matches the
conditions for multifunction PCI devices as well, is that not
all virtual
functions will get their network configuration saved/restored.
If you're not going to use a device (which is
implied by the fact that it's not in the hostdevs list) then
nothing about its network config will change, so there is no
reason to save/restore it.
For example, a guest that uses 3 of 4 functions of a PCI
MultiFunction
card, let's say functions 0,1 and 3. The code will handle the
detach
of all the IOMMU, including the function 2 that isn't declared
in the
XML.
Again, the above sentence implies that you're wanting to make
this completely automatic, which we previously decided was
something we didn't want to do.
However, since function 2 isn't a hostdev, its
network config
will not be restored after the VM shutdown.
You're talking about something that will never
occur - on every SRIOV network card I've ever seen each VF is
in its own IOMMU group, and can be assigned to a guest
independent of what's done with any other VF. I've never seen
a case (except maybe once with a newly released motherboard
that had broken IOMMU firmware(?)) where a VF was in the same
IOMMU group as any other device.
Now comes the question: how much effort should be spent into
making
the network config of all the functions be restored? Is this a
blocker
for the whole code to be accepted or, given it is proper
documented
somewhere, it can be done later on?
|