> From: Michael Kelley <mhklinux@xxxxxxxxxxx> > Sent: Tuesday, October 22, 2024 11:04 AM > [...] > I wasn't aware of the VF handling. Where does the guest notify the > host that it is planning to hibernate? I went looking for such code, but > couldn't immediately find it. Is it in the netvsc driver? Is this the > sequence? > > 1) The guest notifies the host of the hibernate > 2) The host sends a RESCIND_CHANNELOFFER message for each VF > in the VM > 3) The guest waits for all VF rescind processing to complete, and > also must ensure that no new VFs get added in the meantime > 4) Then the guest proceeds with the hibernation, knowing that there > are no open channels for VF devices When a hibernated VM resumes on a different host, it looks like the host team thinks that it's difficult to remember the VMBus device Instance GUID for the VF, and use the same GUID on the new host. When the new host uses a new Instance GUID for the VF, a Windows VM panics, and a Linux VM prints a warning and IIRC loses the ability to hibernate again due to a check in the VMBus driver. So, as a workaround, the host team decides to remove the VF(s) before asking the VM to hibernate. The sequence of a "host-initiated VM hibernation" is: 1) a user clicks the "Hibernation" button on the portal (or uses the equivalent cmd line or API). 2) Internally, the host temporarily disables AccelNet for the vNICs, i.e. sending PCI_EJECT and RESCIND_CHANNELOFFER for each VF. 3) The guest responds accordingly, including sending PCI_EJECTION_COMPLETE and CHANNELMSG_RELID_RELEASED. 4) Once the host knows that AccelNet has been disabled for the VM, the host Sends a "please hibernate" message to the guest via the Shutdown IC. 5) The guest proceeds with the hibernation, knowing that there are no open channels for VF devices and assuming no new VF will be offered during the hibernation operation. 6) When the VM finishes hibernation and powers off, the host internally enables AccelNet for the VM so that when the VM resumes on a new host, the new host can offer a VF with a different VMBus device instance GUID. The above is for a "host-initiated VM hibernation". Currently, the Azure team doesn't support a "VM-initiated hibernation", where the host has no opportunity to temporarily disable AccelNet. Maybe "VM-initiated hibernation" can be supported when MANA-Direct is used (i.e. no more NetVSC NICs and there are only MANA VF NICs): in this scenario, I suppose the host must remember the MANA VF's VMBus device Instance GUID and use the same GUID on the new host. > > The behavior we want is for the guest to hot remove the MLX device > > driver on resume, even if the MLX device was still present at suspend, > > so that the host does not need this special pre-hibernate behavior. This > > patch series may not be sufficient to ensure this, though. It just moves > > things in the right direction, by handling the all-offers-delivered > > message. I'm not sure if it's a good idea to add new code to try to remove an stale MLX VF since the scenario should not exist on Azure nowadays (currently the host temporarily disables AccelNet during hibernation so there should be no stale MLX VF upon resume.) On a local Hyper-V host, after a VM hibernates, we can manually disable AccelNet (i.e. NIC SR-IOV) for the VM, and the VM will see a stale unresponsive MLX VF upon resume. It would be tricky to clean up the VF gracefully: we would have to wait for the resume callback in the Mellanox VF driver to time out on the unresponsive VF (this can take 1 minute) and clean up the related VMBus pass-through device backing the VF; what happens if a host-initiated or VM-initiated hibernation is triggered during the 1 minute? I suspect there may be some tricky race condition issues, e.g. we may need to figure out how to synchronize the .resume with the .remove callbacks of the MLX driver. I think the general assumption is that the VM's configuration should not change at all across hibernation, but it looks like this assumption is found to be false under some conditions from time to time... I wish the assumption can be always true with OpenHCL. Thanks, Dexuan