On Tue, Jan 07, 2025, Suleiman Souhlal wrote: > It returns the cumulative nanoseconds that the host has been suspended. > It is intended to be used for reporting host suspend time to the guest. ... > #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER > +static int kvm_pm_notifier(struct kvm *kvm, unsigned long state) > +{ > + switch (state) { > + case PM_HIBERNATION_PREPARE: > + case PM_SUSPEND_PREPARE: > + last_suspend = ktime_get_boottime_ns(); > + case PM_POST_HIBERNATION: > + case PM_POST_SUSPEND: > + total_suspend_ns += ktime_get_boottime_ns() - last_suspend; After spending too much time poking around kvmlock and sched_clock code, I'm pretty sure that accounting *all* suspend time to steal_time is wildly inaccurate for most clocksources that will be used by KVM x86 guests. KVM already adjusts TSC, and by extension kvmclock, to account for the TSC going backwards due to suspend+resume. I haven't dug super deep, buy I assume/hope the majority of suspend time is handled by massaging guest TSC. There's still a notable gap, as KVM's TSC adjustments likely won't account for the lag between CPUs coming online and vCPU's being restarted, but I don't know that having KVM account the suspend duration is the right way to solve that issue.