Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > From: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> Sent: Tuesday, August 13, 2019 1:34 AM >> >> Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: >> >> > From: Tianyu Lan <lantianyu1986@xxxxxxxxx> Sent: Tuesday, July 30, 2019 6:41 AM >> >> >> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: >> >> > >> >> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: >> >> > >> >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: >> >> > >> lantianyu1986@xxxxxxxxx writes: >> >> > >> >> >> > >> > From: Tianyu Lan <Tianyu.Lan@xxxxxxxxxxxxx> >> >> > >> > >> >> > >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock >> >> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which >> >> > >> > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() >> >> > >> > to set the sched clock function appropriately. On x86, this sets >> >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is >> >> > >> > scaled and adjusted to be continuous. >> >> > >> >> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use >> >> > >> MSR-based clocksource but using it as sched_clock() can be very slow, >> >> > >> I'm afraid. >> >> > >> >> >> > >> On the other hand, what we have now is probably worse: TSC can, >> >> > >> actually, jump backwards (e.g. on migration) and we're breaking the >> >> > >> requirements for sched_clock(). >> >> > > >> >> > > That (obviously) also breaks the requirements for using TSC as >> >> > > clocksource. >> >> > > >> >> > > IOW, it breaks the entire purpose of having TSC in the first place. >> >> > >> >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see >> >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used >> >> > instead. The problem is that 'TSC page' can be disabled by the >> >> > hypervisor and in that case the only remaining clocksource is MSR-based >> >> > (slow). >> >> > >> >> >> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and >> >> kernel uses MSR based >> >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other >> >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should >> >> take this into >> >> account and determine which clocksource should be exposed or not. >> >> >> > >> > We've confirmed with the Hyper-V team that the TSC page is always available >> > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical >> > hardware presents an InvariantTSC. >> >> Currently we check that TSC page is valid on every read and it seems >> this is redundant, right? It is either available on boot or not. I can >> only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will >> likely disable the page (and we can get reenlightenment notification >> then). > > I think Hyper-V can have brief intervals when the TSC page is not valid, so > the code checks for the "sequence" value being zero. Otherwise, yes, it > should always be there or not be there. Is there some other validity > check on every read that you are thinking of? > No, it's this one. In case these 'invalidity periods' are real there's nothing to improve in the current code. >> >> > But the Linux Kconfig's are set up so >> > the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR >> > reads. For 32-bit, this set of changes will add more overhead because the >> > sched clock reads will now be MSR reads. >> > >> > I would be inclined to fix the problem, even with the perf hit on 32-bit Linux. >> > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not >> > supported in Azure so usage is pretty small. The alternative would be to continue >> > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of >> > live migration or similar scenarios. >> >> The issue needs fixing, I agree, however using MSR based clocksource as >> sched clock may give us too big of a performance hit (not sure who cares >> about 32 bit guest performance nowadays but still). What stops us from >> enabling TSC page for 32 bit guests if it is available? > > I talked to KY Srinivasan for any history about TSC page on 32-bit. He said > there was no technical reason not to implement it, but our focus was always > 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, > the required 64x64 multiply and shift is more complex and takes more > more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. > the 64-bit implementation), so the win over a MSR read is less. I > don't know of any actual measurements being made to compare vs. > MSR read. VMExit is 1000 CPU cycles or so, I would guess that TSC page calculations are better. Let me try to build 32bit kernel and do some quick measurements. -- Vitaly