On Mon, Nov 30, 2020 at 5:36 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > Hi! > > This is the first version of the work to make TSC migration more accurate, > as was defined by Paulo at: > https://www.spinics.net/lists/kvm/msg225525.html > > I have a few thoughts about the kvm masterclock synchronization, > which relate to the Paulo's proposal that I implemented. > > The idea of masterclock is that when the host TSC is synchronized > (or as kernel call it, stable), and the guest TSC is synchronized as well, > then we can base the kvmclock, on the same pair of > (host time in nsec, host tsc value), for all vCPUs. > > This makes the random error in calculation of this value invariant > across vCPUS, and allows the guest to do kvmclock calculation in userspace > (vDSO) since kvmclock parameters are vCPU invariant. > > To ensure that the guest tsc is synchronized we currently track host/guest tsc > writes, and enable the master clock only when roughly the same guest's TSC value > was written across all vCPUs. > > Recently this was disabled by Paulo and I agree with this, because I think > that we indeed should only make the guest TSC synchronized by default > (including new hotplugged vCPUs) and not do any tsc synchronization beyond that. > (Trying to guess when the guest syncs the TSC can cause more harm that good). > > Besides, Linux guests don't sync the TSC via IA32_TSC write, > but rather use IA32_TSC_ADJUST which currently doesn't participate > in the tsc sync heruistics. > And as far as I know, Linux guest is the primary (only?) user of the kvmclock. > > I *do think* however that we should redefine KVM_CLOCK_TSC_STABLE > in the documentation to state that it only guarantees invariance if the guest > doesn't mess with its own TSC. > > Also I think we should consider enabling the X86_FEATURE_TSC_RELIABLE > in the guest kernel, when kvm is detected to avoid the guest even from trying > to sync TSC on newly hotplugged vCPUs. > > (The guest doesn't end up touching TSC_ADJUST usually, but it still might > in some cases due to scheduling of guest vCPUs) > > (X86_FEATURE_TSC_RELIABLE short circuits tsc synchronization on CPU hotplug, > and TSC clocksource watchdog, and the later we might want to keep). If you're going to change the guest behavior to be more trusting of the host, I think the host should probably signal this to the guest using a new bit.