On 9/4/24 09:35, Suleiman Souhlal wrote:
On Wed, Aug 28, 2024 at 12:34:26PM -0700, Sean Christopherson wrote:
On Wed, Aug 21, 2024, Steven Rostedt wrote:
From: Steven Rostedt <rostedt@xxxxxxxxxxx>
Commit 92b5265d38f6a ("KVM: Depend on HIGH_RES_TIMERS") added a dependency
to high resolution timers with the comment:
KVM lapic timer and tsc deadline timer based on hrtimer,
setting a leftmost node to rb tree and then do hrtimer reprogram.
If hrtimer not configured as high resolution, hrtimer_enqueue_reprogram
do nothing and then make kvm lapic timer and tsc deadline timer fail.
That was back in 2012, where hrtimer_start_range_ns() would do the
reprogramming with hrtimer_enqueue_reprogram(). But as that was a nop with
high resolution timers disabled, this did not work. But a lot has changed
in the last 12 years.
For example, commit 49a2a07514a3a ("hrtimer: Kick lowres dynticks targets on
timer enqueue") modifies __hrtimer_start_range_ns() to work with low res
timers. There's been lots of other changes that make low res work.
I added this change to my main server that runs all my VMs (my mail
server, my web server, my ssh server) and disabled HIGH_RES_TIMERS and the
system has been running just fine for over a month.
ChromeOS has tested this before as well, and it hasn't seen any issues with
running KVM with high res timers disabled.
Can you provide some background on why this is desirable, and what the effective
tradeoffs are? Mostly so that future users have some chance of making an
informed decision. Realistically, anyone running with HIGH_RES_TIMERS=n is likely
already aware of the tradeoffs, but it'd be nice to capture the info here.
We have found that disabling HR timers saves power without degrading
the user experience too much.
This might have some issues on guests that do not support kvmclock,
because they rely on precise delivery of periodic timers to keep their
clock running. This can be the APIC timer (provided by the kernel), the
RTC (provided by userspace), or the i8254 (choice of kernel/userspace).
These guests are few and far between these days, and in the case of the
APIC timer + Intel hosts we can use the preemption timer (which is
TSC-based and has better latency _and_ accuracy). Furthermore, only x86
is requiring CONFIG_HIGH_RES_TIMERS, so it's probably just excessive
care and we can even apply Steven's patch as is.
Alternatively, the "depends on HIGH_RES_TIMERS || EXPERT" could be added
to virt/kvm. Or a pr_warn could be added to kvm_init if HIGH_RES_TIMERS
are not enabled.
But in general, it seems that Linux has a laissez-faire approach to
disabling CONFIG_HIGH_RES_TIMERS - there must be other code in the
kernel (maybe sound/?) that is relying on having high-enough HZ or
hrtimers but that's not documented anywhere. I don't have an objection
to doing the same in KVM, honestly, since most systems are running
CONFIG_HIGH_RES_TIMERS anyway.
Paolo