Thanks, Michael - I'd convinced myself that the _behavior_ was consistent with the 600 second magic number even though I couldn't see from the control flow exactly how it would apply. We'll try this and report back. Thor -----Original Message----- From: Michael Kelley <mikelley@xxxxxxxxxxxxx> Sent: Wednesday, February 24, 2021 5:39 PM To: Thor Simon <Thor.Simon@xxxxxxxxxxxx>; linux-hyperv@xxxxxxxxxxxxxxx Subject: RE: hv_utils PTP support and hypervisor suspend/resume From: Thor Simon <Thor.Simon@xxxxxxxxxxxx> Sent: Wednesday, February 24, 2021 10:00 AM > > The TimeSync support in hv_utils presently has a "fail safe" limit of > 600 seconds. I'm sure in a datacenter server context, where the > hypervisor's time is expected to be tightly controlled - and continuous - this is sensible. > > Unfortunately, this causes linux VMs to lose time sync unrecoverably > in the not-uncommon case where the hypervisor's running on a laptop or > desktop system that is suspended (or > hibernated) and resumed. > > Does Hyper-V provide any interface by which we could detect this has > occurred and override the test for time too far out of sync? Or, if > not, would adding a module option to suppress the test be acceptable? There is a known bug with 5.8 and earlier kernel versions that can cause Linux timesync with the Hyper-V host to get hung, so that the timesync stops happening. The problem can occur after the Hyper-V host is hibernated and resumed, or if the guest is paused and resumed. The known problem is fixed by this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/hv/hv_util.c?id=b46b4a8a57c377b72a98c7930a9f6969d2d4784e I've just reviewed the code again, and I don't think the 600 second "fail safe" limit is coming into play in the scenario you describe. With the above patch in place, after Hyper-V is resumed after hibernation, the first timesync packet sent by Hyper-V will set the host_ts.ref_time value to a very current time. The ICTIMESYNCFLAG_SYNC flag will also be set, so hv_set_host_time() is called via work_struct adj_time_work. hv_set_host_time() will call hv_get_adj_host_time(), which will find that host_ts.ref_time is very close to the value from hv_read_reference_counter(). So the 600 second test won't be triggered. So my guess is that you experiencing the known bug that I initially described. But let me know if I'm misunderstanding, or if you are seeing a failure path that I'm missing. Michael