From: Tianyu Lan <lantianyu1986@xxxxxxxxx> Sent: Tuesday, July 30, 2019 6:41 AM > > On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > > > > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > > >> lantianyu1986@xxxxxxxxx writes: > > >> > > >> > From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> > > >> > > > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock > > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which > > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() > > >> > to set the sched clock function appropriately. On x86, this sets > > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > > >> > scaled and adjusted to be continuous. > > >> > > >> Hypervisor can, in theory, disable TSC page and then we're forced to use > > >> MSR-based clocksource but using it as sched_clock() can be very slow, > > >> I'm afraid. > > >> > > >> On the other hand, what we have now is probably worse: TSC can, > > >> actually, jump backwards (e.g. on migration) and we're breaking the > > >> requirements for sched_clock(). > > > > > > That (obviously) also breaks the requirements for using TSC as > > > clocksource. > > > > > > IOW, it breaks the entire purpose of having TSC in the first place. > > > > Currently, we mark raw TSC as unstable when running on Hyper-V (see > > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used > > instead. The problem is that 'TSC page' can be disabled by the > > hypervisor and in that case the only remaining clocksource is MSR-based > > (slow). > > > > Yes, that will be slow if Hyper-V doesn't expose hv tsc page and > kernel uses MSR based > clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other > hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should > take this into > account and determine which clocksource should be exposed or not. > We've confirmed with the Hyper-V team that the TSC page is always available on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical hardware presents an InvariantTSC. But the Linux Kconfig's are set up so the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR reads. For 32-bit, this set of changes will add more overhead because the sched clock reads will now be MSR reads. I would be inclined to fix the problem, even with the perf hit on 32-bit Linux. I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not supported in Azure so usage is pretty small. The alternative would be to continue to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of live migration or similar scenarios. Michael