From: Sean Christopherson <seanjc@xxxxxxxxxx> Sent: Monday, February 10, 2025 8:22 AM > > On Sat, Feb 08, 2025, Michael Kelley wrote: > > From: Sean Christopherson <seanjc@xxxxxxxxxx> Sent: Friday, February 7, 2025 9:23 AM > > > > > > Dropping a few people/lists whose emails are bouncing. > > > > > > On Fri, Jan 31, 2025, Sean Christopherson wrote: > > > > @@ -369,6 +369,11 @@ void __init kvmclock_init(void) > > > > #ifdef CONFIG_X86_LOCAL_APIC > > > > x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; > > > > #endif > > > > + /* > > > > + * Save/restore "sched" clock state even if kvmclock isn't being used > > > > + * for sched_clock, as kvmclock is still used for wallclock and relies > > > > + * on these hooks to re-enable kvmclock after suspend+resume. > > > > > > This is wrong, wallclock is a different MSR entirely. > > > > > > > + */ > > > > x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > > > > x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; > > > > > > And usurping sched_clock save/restore is *really* wrong if kvmclock isn't being > > > used as sched_clock, because when TSC is reset on suspend/hiberation, not doing > > > tsc_{save,restore}_sched_clock_state() results in time going haywire. > > > > > > Subtly, that issue goes all the way back to patch "x86/paravirt: Don't use a PV > > > sched_clock in CoCo guests with trusted TSC" because pulling the rug out from > > > under kvmclock leads to the same problem. > > > > > > The whole PV sched_clock scheme is a disaster. > > > > > > Hyper-V overrides the save/restore callbacks, but _also_ runs the old TSC callbacks, > > > because Hyper-V doesn't ensure that it's actually using the Hyper-V clock for > > > sched_clock. And the code is all kinds of funky, because it tries to keep the > > > x86 code isolated from the generic HV clock code, but (a) there's already x86 PV > > > specific code in drivers/clocksource/hyperv_timer.c, and (b) splitting the code > > > means that Hyper-V overides the sched_clock save/restore hooks even when > > > PARAVIRT=n, i.e. when HV clock can't possibly be used as sched_clock. > > > > Regarding (a), the one occurrence of x86 PV-specific code hyperv_timer.c is > > the call to paravirt_set_sched_clock(), and it's under an #ifdef sequence so that > > it's not built if targeting some other architecture. Or do you see something else > > that is x86-specific? > > > > Regarding (b), in drivers/hv/Kconfig, CONFIG_HYPERV always selects PARAVIRT. > > So the #else clause (where PARAVIRT=n) in that #ifdef sequence could arguably > > have a BUILD_BUG() added. If I recall correctly, other Hyper-V stuff breaks if > > PARAVIRT is forced to "n". So I don't think there's a current problem with the > > sched_clock save/restore hooks. i > > Oh, there are no build issues, and all of the x86 bits are nicely cordoned off. > My complaint is essentially that they're _too_ isolated; putting the sched_clock > save/restore setup in arch/x86/kernel/cpu/mshyperv.c is well-intentioned, but IMO > it does more harm than good because the split makes it difficult to connect the > dots to hv_setup_sched_clock() in drivers/clocksource/hyperv_timer.c. > > > But I would be good with some restructuring so that setting the sched clock > > save/restore hooks is more closely tied to the sched clock choice, > > Yeah, this is the intent of my ranting. After the dust settles, the code can > look like this. I'm good with what you are proposing. And if you want, there's no real need for hv_ref_counter_at_suspend and hv_save/restore_sched_clock_state() to be in the #ifdef sequence since the code has no architecture dependencies. Sure, the only current caller is x86-specific, but the functionality is generic and might useful on some other architecture in the future. That was the essence of my comment on the original patch that added this code [1]. The patch author took it a step further and moved all the code into an x86-specific module, which I was OK with at the time. But your moving it back is probably better. [1] https://lore.kernel.org/all/SN6PR02MB4157141DD58FD6EAE96C7CABD4992@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > --- > #ifdef CONFIG_GENERIC_SCHED_CLOCK > static __always_inline void hv_setup_sched_clock(void *sched_clock) > { > /* > * We're on an architecture with generic sched clock (not x86/x64). > * The Hyper-V sched clock read function returns nanoseconds, not > * the normal 100ns units of the Hyper-V synthetic clock. > */ > sched_clock_register(sched_clock, 64, NSEC_PER_SEC); > } > #elif defined CONFIG_PARAVIRT > static u64 hv_ref_counter_at_suspend; > /* > * Hyper-V clock counter resets during hibernation. Save and restore clock > * offset during suspend/resume, while also considering the time passed > * before suspend. This is to make sure that sched_clock using hv tsc page > * based clocksource, proceeds from where it left off during suspend and > * it shows correct time for the timestamps of kernel messages after resume. > */ > static void hv_save_sched_clock_state(void) > { > hv_ref_counter_at_suspend = hv_read_reference_counter(); > } > > static void hv_restore_sched_clock_state(void) > { > /* > * Adjust the offsets used by hv tsc clocksource to > * account for the time spent before hibernation. > * adjusted value = reference counter (time) at suspend > * - reference counter (time) now. > */ > hv_sched_clock_offset -= (hv_ref_counter_at_suspend - > hv_read_reference_counter()); > } > > static __always_inline void hv_setup_sched_clock(void *sched_clock) > { > /* We're on x86/x64 *and* using PV ops */ > paravirt_set_sched_clock(sched_clock, hv_save_sched_clock_state, > hv_restore_sched_clock_state); > } > #else /* !CONFIG_GENERIC_SCHED_CLOCK && !CONFIG_PARAVIRT */ > static __always_inline void hv_setup_sched_clock(void *sched_clock) {} > #endif /* CONFIG_GENERIC_SCHED_CLOCK */ > --- > > > as long as the architecture independence of hyperv_timer.c is preserved. > > LOL, ah yes, the architecture independence of MSRs and TSC :-) This is a digression from what you are trying to accomplish, but the function and symbols names reflect history and are misleading. The terms "TSC" and "MSR" are used, but arch-specific wrapper functions map "TSC" to the arm64 architectural system counter, for example. And the MSR references are all to Hyper-V synthetic MSRs, which are mapped to the arm64 equivalent registers and are accessed via explicit hypercalls instead of rdmsr()/wrmsr(). I wish we had better terminology to use in the generic code. > > Teasing aside, the code is firmly x86-only at the moment. It's selectable only > by x86: > > config HYPERV_TIMER > def_bool HYPERV && X86 > > and since at least commit e39acc37db34 ("clocksource: hyper-v: Provide noinstr > sched_clock()") there are references to symbols/functions that are provided only > by x86. > > I assume arm64 support is a WIP, but keeping the upstream code arch independent > isn't very realistic if the code can't be at least compile-tested. To help > drive-by contributors like myself, maybe select HYPER_TIMER on arm64 for > COMPILE_TEST=y builds? Ah yes. I think I missed that commit e39acc37db34 added hv_raw_get_register(), which doesn't have an arm64 equivalent. I had not previously known about COMPILE_TEST=y. Thanks for pointing that out, and I'll check into using it. Michael > > config HYPERV_TIMER > def_bool HYPERV && (X86 || (COMPILE_TEST && ARM64)) > > I have no plans to touch code outside of CONFIG_PARAVIRT, i.e. outside of code > that is explicitly x86-only, but something along those lines would help people > like me understand the goal/intent, and in theory would also help y'all maintain > the code by detecting breakage.