From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Sent: Tuesday, August 13, 2019 1:34 AM > > Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > > > From: Tianyu Lan <lantianyu1986@xxxxxxxxx> Sent: Tuesday, July 30, 2019 6:41 AM > >> > >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > >> > > >> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > >> > > >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > >> > >> lantianyu1986@xxxxxxxxx writes: > >> > >> > >> > >> > From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> > >> > >> > > >> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock > >> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which > >> > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() > >> > >> > to set the sched clock function appropriately. On x86, this sets > >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > >> > >> > scaled and adjusted to be continuous. > >> > >> > >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use > >> > >> MSR-based clocksource but using it as sched_clock() can be very slow, > >> > >> I'm afraid. > >> > >> > >> > >> On the other hand, what we have now is probably worse: TSC can, > >> > >> actually, jump backwards (e.g. on migration) and we're breaking the > >> > >> requirements for sched_clock(). > >> > > > >> > > That (obviously) also breaks the requirements for using TSC as > >> > > clocksource. > >> > > > >> > > IOW, it breaks the entire purpose of having TSC in the first place. > >> > > >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see > >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used > >> > instead. The problem is that 'TSC page' can be disabled by the > >> > hypervisor and in that case the only remaining clocksource is MSR-based > >> > (slow). > >> > > >> > >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and > >> kernel uses MSR based > >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other > >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should > >> take this into > >> account and determine which clocksource should be exposed or not. > >> > > > > We've confirmed with the Hyper-V team that the TSC page is always available > > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical > > hardware presents an InvariantTSC. > > Currently we check that TSC page is valid on every read and it seems > this is redundant, right? It is either available on boot or not. I can > only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will > likely disable the page (and we can get reenlightenment notification > then). I think Hyper-V can have brief intervals when the TSC page is not valid, so the code checks for the "sequence" value being zero. Otherwise, yes, it should always be there or not be there. Is there some other validity check on every read that you are thinking of? > > > But the Linux Kconfig's are set up so > > the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR > > reads. For 32-bit, this set of changes will add more overhead because the > > sched clock reads will now be MSR reads. > > > > I would be inclined to fix the problem, even with the perf hit on 32-bit Linux. > > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not > > supported in Azure so usage is pretty small. The alternative would be to continue > > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of > > live migration or similar scenarios. > > The issue needs fixing, I agree, however using MSR based clocksource as > sched clock may give us too big of a performance hit (not sure who cares > about 32 bit guest performance nowadays but still). What stops us from > enabling TSC page for 32 bit guests if it is available? I talked to KY Srinivasan for any history about TSC page on 32-bit. He said there was no technical reason not to implement it, but our focus was always 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, the required 64x64 multiply and shift is more complex and takes more more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. the 64-bit implementation), so the win over a MSR read is less. I don't know of any actual measurements being made to compare vs. MSR read. Michael > > -- > Vitaly