From: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> Sent: Sunday, November 10, 2024 9:44 PM > > On 11/7/2024 11:14 AM, Naman Jain wrote: > > > > On 11/1/2024 12:44 AM, Michael Kelley wrote: > >> From: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, October 29, 2024 1:02 AM > >>> [snip] > >>> @@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev) > >>> > >>> vmbus_request_offers(); > >>> > >>> + mutex_lock(&vmbus_connection.channel_mutex); > >>> + list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) { > >>> + if (channel->offermsg.child_relid != INVALID_RELID) > >>> + continue; > >>> + > >>> + /* hvsock channels are not expected to be present. */ > >>> + if (is_hvsock_channel(channel)) > >>> + continue; > >>> + > >>> + pr_err("channel %pUl/%pUl not present after resume.\n", > >>> + &channel->offermsg.offer.if_type, > >>> + &channel->offermsg.offer.if_instance); > >>> + /* ToDo: Cleanup these channels here */ > >>> + } > >>> + mutex_unlock(&vmbus_connection.channel_mutex); > >>> + > >> > >> Dexuan and John have explained how in Azure VMs, there should not be > >> any VFs assigned to the VM at the time of hibernation. So the above > >> check for missing offers does not trigger an error message due to > >> VF offers coming after the all-offers-received message. > >> > >> But what about the case of a VM running on a local Hyper-V? I'm not > >> completely clear, but in that case I don't think any VFs are removed > >> before the hibernation, especially for VM-initiated hibernation. It's > > > > I am not sure about this behavior. I have requested Dexuan offline > > for a comment. > > > >> a reasonable scenario to later resume that same VM, with the same > >> VF assigned to the VM. Because of the way current code counts > >> the offers, vmbus_bus_resume() waits for the VF to be offered again, > >> and all the channels get correct post-resume relids. But the changes > >> in this patch set break that scenario. Since vmbus_bus_resume() now > >> proceeds before the VF offer arrives, hv_pci_resume() calling > >> vmbus_open() could use the pre-hibernation relid for the VF and break > >> things. Certainly the "not present after resume" error message would > >> be spurious. > >> > >> Maybe the focus here is Azure, and it's tolerable for the local Hyper-V > >> case with a VF to not work pending later fixes. But I thought I'd call > >> out the potential issue (assuming my thinking is correct). > >> > >> Michael > > > > IIUC, below scenarios can happen based on your comment- > > > > Case 1: > > VF channel offer is received in time before hv_pci_resume() and there > > are no problems. > > > > Case 2: > > Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning > > is printed that this VF channel is not present after resume. > > Then two scenarios can happen: > > Case 2.1: > > VF channel offer is received before hv_pci_resume() and things work > > fine. Warning printed is spurious. > > Case 2.2: > > VM channel offer is not received before hv_pci_resume() and relid is > > not yet restored by onoffer. This is a problem. Warning is printed in > > this case for missing offer. > > > > I think it all depends on whether or not VFs are removed in local > > HyperV VMs. I'll try to get this information. Thanks for pointing this > > out. > > > > Regards, > > Naman > > > > Hi Michael, > I discussed with Dexuan and we tried these scenarios. Here are the > observations: > > For the two ways of host initiated hibernation: > > #1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component) > OR > #2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange > ability) Question: What Powershell module provides "Invoke-Hibernate"? It's not present on my Windows 11 system that is running Hyper-V, and I can't find any documentation about it on the web. Or maybe Invoke-Hibernate is a Powershell *script*? > > #1 does not remove the VF before sending the hibernate message to the VM > via hv_utils, but #2 does. > > With both #1 and #2, during resume, the host offers the vPCI vmbus > device before the vmbus_onoffers_delivered() is called. Whether or not > VFs are removed doesn't matter here, because during resume the first > fresh kernel always requests the VF device, meaning it has become a > boot-time device when the 'old' kernel is resuming back. So the issue we > are discussing will not happen in practice and the patch won't break > things and won't print spurious warnings. If its OK, please let me know, > I'll then proceed with v3. > Ah, this is interesting. I'm assuming these are the details: 1) VM boots with the intent of resuming from hibernation (though Hyper-V doesn't know about that intent) 2) Original fresh kernel is loaded and begins initialization 3) VMBus offers come in for boot-time devices, which excludes SR-IOV VFs. 4) ALLOFFERS_DELIVERED message comes in 5) The storvsc driver initializes for the virtual disks on the VM 6) Kernel initialization code finds and reads the swap space to see if a hibernation image is present. If so, it reads in the hibernation image. 7) The suspend sequence is initiated (just like during hibernation) to shutdown the VMBus devices and terminate the VMBus connection. 8) Control is transferred to the previously read-in hibernation image 9) The hibernation image runs the resume sequence, which initiates a new VMBus connection and requests offers 10) VMBus offers come in for whatever VMBus devices were present when Step 7 initiated the suspend sequence. If a VF device was present at that time, an offer for that VF device will come in and will match up with the VF that was present in the VM at the time of hibernation. 11) ALLOFFERS_DELIVERED message comes in again for the newly initiated VMBus connection. The netvsc driver gets initialized *after* step 4, but we don't know exactly *when* relative to the storvsc driver. The netvsc driver must tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent sometime after that. While this netvsc/VF sequence is happening, the storvsc driver is reading the hibernation image from swap (Step 6). I think the sequence you describe works when reading the hibernation image from swap takes 10's of seconds, or even several minutes in an Azure VM with a remote disk. That gives plenty of time for the VF to get initialized and be fully present when Step 7 starts. But there's no *guarantee* that the VF is initialized by then. It's also not clear to me what action by the guest causes Hyper-V to treat the VF as "added to the VM" so that in Step 10 the VF offer is sent before ALLOFFERS_DELIVERED. The sequence you describe also happens in an Azure VM, even if the VF is removed before hibernation. When the VF offer arrives during Step 10, it doesn't match with any VFs that were in the VM at the time of hibernation. It's treated as a new device, just like it would be if the offer arrived after ALLOFFERS_DELIVERED. But it seems like there's still the risk of having a fast swap disk and a small hibernation image that can be read in a shorter amount of time than it takes to initialize the VF to the point that Hyper-V treats it as added to the VM. Without knowing what that point is, it's hard to assess the likelihood of that happening. Or maybe there's an interlock I'm not aware of that ensures Step 7 can't proceed while the netvsc/VF sequence is in progress. So maybe it's best to proceed with this patch, and deal with the risk later when/if it becomes reality. I'm OK if you want to do that. This has been an interesting discussion that I'll try to capture in some high-level documentation about how Linux guests on Hyper-V do hibernation! Michael