On 11/7/2024 11:14 AM, Naman Jain wrote:
On 11/1/2024 12:44 AM, Michael Kelley wrote:
From: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, October
29, 2024 1:02 AM
When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the essential virtual devices configured for a VM, remain
same, before and after the hibernation and its not very common that
some offers are missing.
The wording here is a bit jumbled. And let's use consistent terminology.
I'd suggest:
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's
uncommon
for the configuration to have been changed such that offers are missing.
Changing the configuration violates the rules for hibernation anyway.
Sure, I'll make the required changes.
The cleanup of missing channels is not
straight-forward and dependent on individual device driver
functionality and implementation, so it can be added in future as
separate changes.
Signed-off-by: John Starks <jostarks@xxxxxxxxxxxxx>
Co-developed-by: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx>
Reviewed-by: Easwar Hariharan <eahariha@xxxxxxxxxxxxxxxxxxx>
---
Changes since v1:
https://lore.kernel.org/all/20241018115811.5530-1-
namjain@xxxxxxxxxxxxxxxxxxx/
* Added Easwar's Reviewed-By tag
* Addressed Saurabh's comments:
* Added a note for missing channel cleanup in comments and commit msg
---
drivers/hv/vmbus_drv.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bd3fc41dc06b..08214f28694a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
static int vmbus_bus_resume(struct device *dev)
{
+ struct vmbus_channel *channel;
struct vmbus_channel_msginfo *msginfo;
size_t msgsize;
int ret;
@@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
vmbus_request_offers();
+ mutex_lock(&vmbus_connection.channel_mutex);
+ list_for_each_entry(channel, &vmbus_connection.chn_list,
listentry) {
+ if (channel->offermsg.child_relid != INVALID_RELID)
+ continue;
+
+ /* hvsock channels are not expected to be present. */
+ if (is_hvsock_channel(channel))
+ continue;
+
+ pr_err("channel %pUl/%pUl not present after resume.\n",
+ &channel->offermsg.offer.if_type,
+ &channel->offermsg.offer.if_instance);
+ /* ToDo: Cleanup these channels here */
+ }
+ mutex_unlock(&vmbus_connection.channel_mutex);
+
Dexuan and John have explained how in Azure VMs, there should not be
any VFs assigned to the VM at the time of hibernation. So the above
check for missing offers does not trigger an error message due to
VF offers coming after the all-offers-received message.
But what about the case of a VM running on a local Hyper-V? I'm not
completely clear, but in that case I don't think any VFs are removed
before the hibernation, especially for VM-initiated hibernation. It's
I am not sure about this behavior. I have requested Dexuan offline
for a comment.
a reasonable scenario to later resume that same VM, with the same
VF assigned to the VM. Because of the way current code counts
the offers, vmbus_bus_resume() waits for the VF to be offered again,
and all the channels get correct post-resume relids. But the changes
in this patch set break that scenario. Since vmbus_bus_resume() now
proceeds before the VF offer arrives, hv_pci_resume() calling
vmbus_open() could use the pre-hibernation relid for the VF and break
things. Certainly the "not present after resume" error message would
be spurious.
Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
case with a VF to not work pending later fixes. But I thought I'd call
out the potential issue (assuming my thinking is correct).
Michael
IIUC, below scenarios can happen based on your comment-
Case 1:
VF channel offer is received in time before hv_pci_resume() and there
are no problems.
Case 2:
Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
is printed that this VF channel is not present after resume.
Then two scenarios can happen:
Case 2.1:
VF channel offer is received before hv_pci_resume() and things work
fine. Warning printed is spurious.
Case 2.2:
VM channel offer is not received before hv_pci_resume() and relid is
not yet restored by onoffer. This is a problem. Warning is printed in
this case for missing offer.
I think it all depends on whether or not VFs are removed in local
HyperV VMs. I'll try to get this information. Thanks for pointing this
out.
Regards,
Naman
Hi Michael,
I discussed with Dexuan and we tried these scenarios. Here are the
observations:
For the two ways of host initiated hibernation:
#1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
OR
#2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange
ability)
#1 does not remove the VF before sending the hibernate message to the VM
via hv_utils, but #2 does.
With both #1 and #2, during resume, the host offers the vPCI vmbus
device before the vmbus_onoffers_delivered() is called. Whether or not
VFs are removed doesn't matter here, because during resume the first
fresh kernel always requests the VF device, meaning it has become a
boot-time device when the 'old' kernel is resuming back. So the issue we
are discussing will not happen in practice and the patch won't break
things and won't print spurious warnings. If its OK, please let me know,
I'll then proceed with v3.
Thanks,
Naman
/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);
--
2.34.1