On Wed, Oct 3, 2018 at 8:10 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Wed, 3 Oct 2018, Andy Lutomirski wrote: > > > On Oct 3, 2018, at 5:01 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > > Not all Hyper-V hosts support reenlightenment notifications (and, if I'm > > > not mistaken, you need to enable nesting for the VM to get the feature - > > > and most VMs don't have this) so I think we'll have to keep Hyper-V > > > vclock for the time being. > > > > > But this does suggest that the correct way to pass a clock through to an > > L2 guest where L0 is HV is to make L1 use the “tsc” clock and L2 use > > kvmclock (or something newer and better). This would require adding > > support for atomic frequency changes all the way through the timekeeping > > and arch code. > > > > John, tglx, would that be okay or crazy? > > Not sure what you mean. I think I lost you somewhere on the way. > What I mean is: currently we have a clocksource called ""hyperv_clocksource_tsc_page". Reading it does: static u64 read_hv_clock_tsc(struct clocksource *arg) { u64 current_tick = hv_read_tsc_page(tsc_pg); if (current_tick == U64_MAX) rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick); return current_tick; } >From Vitaly's email, it sounds like, on most (all?) hyperv systems with nesting enabled, this clock is better behaved than it appears. It sounds like the read behavior is that current_tick will never be U64_MAX -- instead, the clock always works and, more importantly, the actual scaling factor and offset only change observably on *guest* request. So why don't we we improve the actual "tsc" clocksource to understand this? ISTM the best model would be where the __clocksource_update_freq_xyz() mechanism gets called so we can use it like this: clocksource_begin_update(); clocksource_update_mult_shift(); tell_hv_that_we_reenlightened(); clocksource_end_update(); Where clocksource_begin_update() bumps the seqcount for the vDSO and takes all the locks, clocksource_update_mult_shift() updates everything, and clocksource_end_update() makes the updated parameters usable. (AFAICT there are currently no clocksources at all in the entire kernel that update their frequency on the fly using __clocksource_update_xyz(). Unless I'm missing something, the x86 tsc cpufreq hooks don't call into the core timekeeping at all, so I'm assuming that the tsc clocksource is just unusable as a clocksource on systems that change its frequency.) Or we could keep the hyperv_clocksource_tsc_page clocksource but make it use VCLOCK_TSC and a similar update mechanism. I don't personally want to do this, because the timekeeping code is subtle and I'm unfamiliar with it. And I don't have *that* many spare cycles :) _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization