RE: hv_utils PTP support and hypervisor suspend/resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Michael - I'd convinced myself that the _behavior_ was consistent with the 600 second magic number even though I couldn't see from the control flow exactly how it would apply.  We'll try this and report back.

Thor

-----Original Message-----
From: Michael Kelley <mikelley@xxxxxxxxxxxxx> 
Sent: Wednesday, February 24, 2021 5:39 PM
To: Thor Simon <Thor.Simon@xxxxxxxxxxxx>; linux-hyperv@xxxxxxxxxxxxxxx
Subject: RE: hv_utils PTP support and hypervisor suspend/resume

From: Thor Simon <Thor.Simon@xxxxxxxxxxxx> Sent: Wednesday, February 24, 2021 10:00 AM
> 
> The TimeSync support in hv_utils presently has a "fail safe" limit of 
> 600 seconds.  I'm sure in a datacenter server context, where the 
> hypervisor's time is expected to be tightly controlled - and continuous - this is sensible.
> 
> Unfortunately, this causes linux VMs to lose time sync unrecoverably 
> in the not-uncommon case where the hypervisor's running on a laptop or 
> desktop system that is suspended (or
> hibernated) and resumed.
> 
> Does Hyper-V provide any interface by which we could detect this has 
> occurred and override the test for time too far out of sync?  Or, if 
> not, would adding a module option to suppress the test be acceptable?

There is a known bug with 5.8 and earlier kernel versions that can cause Linux timesync with the Hyper-V host to get hung, so that the timesync stops happening.  The problem can occur after the Hyper-V host is hibernated and resumed, or if the guest is paused and resumed. The known problem is fixed by this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/hv/hv_util.c?id=b46b4a8a57c377b72a98c7930a9f6969d2d4784e

I've just reviewed the code again, and I don't think the 600 second "fail safe"
limit is coming into play in the scenario you describe.   With the above patch in
place, after Hyper-V is resumed after hibernation, the first timesync packet sent by Hyper-V will set the host_ts.ref_time value to a very current time.  The ICTIMESYNCFLAG_SYNC flag will also be set, so hv_set_host_time() is called via work_struct adj_time_work.  hv_set_host_time() will call hv_get_adj_host_time(), which will find that host_ts.ref_time is very close to the value from hv_read_reference_counter().  So the 600 second test won't be triggered.

So my guess is that you experiencing the known bug that I initially described.
But let me know if I'm misunderstanding, or if you are seeing a failure path that I'm missing.

Michael




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux