From: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> Sent: Monday, September 16, 2024 10:39 PM > > read_hv_sched_clock_tsc() assumes that the Hyper-V clock counter is > bigger than the variable hv_sched_clock_offset, which is cached during > early boot, but depending on the timing this assumption may be false > when a hibernated VM starts again (the clock counter starts from 0 > again) and is resuming back (Note: hv_init_tsc_clocksource() is not > called during hibernation/resume); consequently, > read_hv_sched_clock_tsc() may return a negative integer (which is > interpreted as a huge positive integer since the return type is u64) > and new kernel messages are prefixed with huge timestamps before > read_hv_sched_clock_tsc() grows big enough (which typically takes > several seconds). > > Fix the issue by saving the Hyper-V clock counter just before the > suspend, and using it to correct the hv_sched_clock_offset in > resume. This makes hv tsc page based sched_clock continuous and ensures > that post resume, it starts from where it left off during suspend. > Override x86_platform.save_sched_clock_state and > x86_platform.restore_sched_clock_state routines to correct this as soon > as possible. > > Note: if Invariant TSC is available, the issue doesn't happen because > 1) we don't register read_hv_sched_clock_tsc() for sched clock: > See commit e5313f1c5404 ("clocksource/drivers/hyper-v: Rework > clocksource and sched clock setup"); > 2) the common x86 code adjusts TSC similarly: see > __restore_processor_state() -> tsc_verify_tsc_adjust(true) and > x86_platform.restore_sched_clock_state(). > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: 1349401ff1aa ("clocksource/drivers/hyper-v: Suspend/resume Hyper-V > clocksource for hibernation") > Co-developed-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx> > Signed-off-by: Naman Jain <namjain@xxxxxxxxxxxxxxxxxxx> > --- > Changes from v2: > https://lore.kernel.org/all/20240911045632.3757-1-namjain@xxxxxxxxxxxxxxxxxxx/ > Addressed Michael's comments: > * Changed commit msg to include information on making timestamps > continuous > * Changed subject to reflect the new file being changed > * Changed variable name for saving offset/counters > * Moved comment on new function introduced from header file to function > definition. > * Removed the equations in comments > * Rebased to latest linux-next tip > > Changes from v1: > https://lore.kernel.org/all/20240909053923.8512-1-namjain@xxxxxxxxxxxxxxxxxxx/ > * Reorganized code as per Michael's comment, and moved the logic to x86 > specific files, to keep hyperv_timer.c arch independent. > > --- > arch/x86/kernel/cpu/mshyperv.c | 58 ++++++++++++++++++++++++++++++ > drivers/clocksource/hyperv_timer.c | 14 +++++++- > include/clocksource/hyperv_timer.h | 2 ++ > 3 files changed, 73 insertions(+), 1 deletion(-) This all looks good to me now. Thanks for indulging all my comments. :-) One minor nit: The "Subject:" prefix should not have a dash in "hyper-v". The goal is to be consistent in the prefixes used for a particular file instead of wandering all over the place. If you check the commit history for arch/x86/kernel/cpu/mshyperv.c, you'll see that it is "x86/hyperv", not "x86/hyperv-v". This is super picky, so don't respin just for this. The maintainer can probably fix it when accepting the patch if there are no other changes. Reviewed-by: Michael Kelley <mhklinux@xxxxxxxxxxx> > > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > index ead967479fa6..e8e25d6e64cd 100644 > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -224,6 +224,63 @@ static void hv_machine_crash_shutdown(struct pt_regs > *regs) > hyperv_cleanup(); > } > #endif /* CONFIG_CRASH_DUMP */ > + > +static u64 hv_ref_counter_at_suspend; > +static void (*old_save_sched_clock_state)(void); > +static void (*old_restore_sched_clock_state)(void); > + > +/* > + * Hyper-V clock counter resets during hibernation. Save and restore clock > + * offset during suspend/resume, while also considering the time passed > + * before suspend. This is to make sure that sched_clock using hv tsc page > + * based clocksource, proceeds from where it left off during suspend and > + * it shows correct time for the timestamps of kernel messages after resume. > + */ > +static void save_hv_clock_tsc_state(void) > +{ > + hv_ref_counter_at_suspend = hv_read_reference_counter(); > +} > + > +static void restore_hv_clock_tsc_state(void) > +{ > + /* > + * Adjust the offsets used by hv tsc clocksource to > + * account for the time spent before hibernation. > + * adjusted value = reference counter (time) at suspend > + * - reference counter (time) now. > + */ > + hv_adj_sched_clock_offset(hv_ref_counter_at_suspend - hv_read_reference_counter()); > +} > + > +/* > + * Functions to override save_sched_clock_state and restore_sched_clock_state > + * functions of x86_platform. The Hyper-V clock counter is reset during > + * suspend-resume and the offset used to measure time needs to be > + * corrected, post resume. > + */ > +static void hv_save_sched_clock_state(void) > +{ > + old_save_sched_clock_state(); > + save_hv_clock_tsc_state(); > +} > + > +static void hv_restore_sched_clock_state(void) > +{ > + restore_hv_clock_tsc_state(); > + old_restore_sched_clock_state(); > +} > + > +static void __init x86_setup_ops_for_tsc_pg_clock(void) > +{ > + if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE)) > + return; > + > + old_save_sched_clock_state = x86_platform.save_sched_clock_state; > + x86_platform.save_sched_clock_state = hv_save_sched_clock_state; > + > + old_restore_sched_clock_state = x86_platform.restore_sched_clock_state; > + x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state; > +} > #endif /* CONFIG_HYPERV */ > > static uint32_t __init ms_hyperv_platform(void) > @@ -590,6 +647,7 @@ static void __init ms_hyperv_init_platform(void) > > /* Register Hyper-V specific clocksource */ > hv_init_clocksource(); > + x86_setup_ops_for_tsc_pg_clock(); > hv_vtl_init_platform(); > #endif > /* > diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c > index 99177835cade..b39dee7b93af 100644 > --- a/drivers/clocksource/hyperv_timer.c > +++ b/drivers/clocksource/hyperv_timer.c > @@ -27,7 +27,8 @@ > #include <asm/mshyperv.h> > > static struct clock_event_device __percpu *hv_clock_event; > -static u64 hv_sched_clock_offset __ro_after_init; > +/* Note: offset can hold negative values after hibernation. */ > +static u64 hv_sched_clock_offset __read_mostly; > > /* > * If false, we're using the old mechanism for stimer0 interrupts > @@ -470,6 +471,17 @@ static void resume_hv_clock_tsc(struct clocksource *arg) > hv_set_msr(HV_MSR_REFERENCE_TSC, tsc_msr.as_uint64); > } > > +/* > + * Called during resume from hibernation, from overridden > + * x86_platform.restore_sched_clock_state routine. This is to adjust offsets > + * used to calculate time for hv tsc page based sched_clock, to account for > + * time spent before hibernation. > + */ > +void hv_adj_sched_clock_offset(u64 offset) > +{ > + hv_sched_clock_offset -= offset; > +} > + > #ifdef HAVE_VDSO_CLOCKMODE_HVCLOCK > static int hv_cs_enable(struct clocksource *cs) > { > diff --git a/include/clocksource/hyperv_timer.h b/include/clocksource/hyperv_timer.h > index 6cdc873ac907..aa5233b1eba9 100644 > --- a/include/clocksource/hyperv_timer.h > +++ b/include/clocksource/hyperv_timer.h > @@ -38,6 +38,8 @@ extern void hv_remap_tsc_clocksource(void); > extern unsigned long hv_get_tsc_pfn(void); > extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void); > > +extern void hv_adj_sched_clock_offset(u64 offset); > + > static __always_inline bool > hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, > u64 *cur_tsc, u64 *time) > > base-commit: a430d95c5efa2b545d26a094eb5f624e36732af0 > -- > 2.34.1