Re: [PATCH v3 1/3] kvm: Introduce kvm_total_suspend_ns().

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 15 Jan 2025 13:49:35 -0800

On Tue, Jan 07, 2025, Suleiman Souhlal wrote:
> It returns the cumulative nanoseconds that the host has been suspended.
> It is intended to be used for reporting host suspend time to the guest.

...

>  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
> +static int kvm_pm_notifier(struct kvm *kvm, unsigned long state)
> +{
> +	switch (state) {
> +	case PM_HIBERNATION_PREPARE:
> +	case PM_SUSPEND_PREPARE:
> +		last_suspend = ktime_get_boottime_ns();
> +	case PM_POST_HIBERNATION:
> +	case PM_POST_SUSPEND:
> +		total_suspend_ns += ktime_get_boottime_ns() - last_suspend;

After spending too much time poking around kvmlock and sched_clock code, I'm pretty
sure that accounting *all* suspend time to steal_time is wildly inaccurate for
most clocksources that will be used by KVM x86 guests.

KVM already adjusts TSC, and by extension kvmclock, to account for the TSC going
backwards due to suspend+resume.  I haven't dug super deep, buy I assume/hope the
majority of suspend time is handled by massaging guest TSC.

There's still a notable gap, as KVM's TSC adjustments likely won't account for
the lag between CPUs coming online and vCPU's being restarted, but I don't know
that having KVM account the suspend duration is the right way to solve that issue.