On 9/20/2024 2:24 PM, Nikunj A. Dadhania wrote: > On 9/20/2024 12:51 PM, Sean Christopherson wrote: >> On Fri, Sep 20, 2024, Nikunj A. Dadhania wrote: >>> On 9/18/2024 5:37 PM, Sean Christopherson wrote: >>>> On Mon, Sep 16, 2024, Nikunj A. Dadhania wrote: >>>>> On 9/13/2024 11:00 PM, Sean Christopherson wrote: >>>>>>> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> >>>>>>> Tested-by: Peter Gonda <pgonda@xxxxxxxxxx> >>>>>>> --- >>>>>>> arch/x86/kernel/kvmclock.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c >>>>>>> index 5b2c15214a6b..3d03b4c937b9 100644 >>>>>>> --- a/arch/x86/kernel/kvmclock.c >>>>>>> +++ b/arch/x86/kernel/kvmclock.c >>>>>>> @@ -289,7 +289,7 @@ void __init kvmclock_init(void) >>>>>>> { >>>>>>> u8 flags; >>>>>>> >>>>>>> - if (!kvm_para_available() || !kvmclock) >>>>>>> + if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SECURE_TSC)) >>>>>> >>>>>> I would much prefer we solve the kvmclock vs. TSC fight in a generic way. Unless >>>>>> I've missed something, the fact that the TSC is more trusted in the SNP/TDX world >>>>>> is simply what's forcing the issue, but it's not actually the reason why Linux >>>>>> should prefer the TSC over kvmclock. The underlying reason is that platforms that >>>>>> support SNP/TDX are guaranteed to have a stable, always running TSC, i.e. that the >>>>>> TSC is a superior timesource purely from a functionality perspective. That it's >>>>>> more secure is icing on the cake. >>>>> >>>>> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock >>>>> should be disabled assuming that timesource is stable and always running? >>>> >>>> No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC >>>> is stable, irrespective of SNP or TDX. This is effectively already done for the >>>> timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if >>>> invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the >>>> kvm_sched_clock_init() code. >>> >>> The kvm-clock and tsc-early both are having the rating of 299. As they are of >>> same rating, kvm-clock is being picked up first. >>> >>> Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will >>> be picked up instead. >> >> IMO, it's ugly, but that's a problem with the rating system inasmuch as anything. >> >> But the kernel will still be using kvmclock for the scheduler clock, which is >> undesirable. > > Agree, kvm_sched_clock_init() is still being called. The above hunk was to use > tsc-early/tsc as the clocksource and not kvm-clock. How about the below patch: From: Nikunj A Dadhania <nikunj@xxxxxxx> Date: Tue, 28 Nov 2023 18:29:56 +0530 Subject: [RFC PATCH] x86/kvmclock: Prefer invariant TSC as the clocksource and scheduler clock For platforms that support stable and always running TSC, although the kvm-clock rating is dropped to 299 to prefer TSC, the guest scheduler clock still keeps on using the kvm-clock which is undesirable. Moreover, as the kvm-clock and early-tsc clocksource are both registered with 299 rating, kvm-clock is being picked up momentarily instead of selecting more stable tsc-early clocksource. kvm-clock: Using msrs 4b564d01 and 4b564d00 kvm-clock: using sched offset of 1799357702246960 cycles clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns tsc: Detected 1996.249 MHz processor clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns clocksource: Switched to clocksource kvm-clock clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398cadd9d93, max_idle_ns: 881590552906 ns clocksource: Switched to clocksource tsc Drop the kvm-clock rating to 298, so that tsc-early is picked up before kvm-clock and use TSC for scheduler clock as well when the TSC is invariant and stable. Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> --- The issue we see here is that on bare-metal if the TSC is marked unstable, then the sched-clock will fall back to jiffies. In the virtualization case, do we want to fall back to kvm-clock when TSC is marked unstable? --- arch/x86/kernel/kvmclock.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 5b2c15214a6b..c997b2628c4b 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -317,9 +317,6 @@ void __init kvmclock_init(void) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); - flags = pvclock_read_flags(&hv_clock_boot[0].pvti); - kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); - x86_platform.calibrate_tsc = kvm_get_tsc_khz; x86_platform.calibrate_cpu = kvm_get_tsc_khz; x86_platform.get_wallclock = kvm_get_wallclock; @@ -341,8 +338,12 @@ void __init kvmclock_init(void) */ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && - !check_tsc_unstable()) - kvm_clock.rating = 299; + !check_tsc_unstable()) { + kvm_clock.rating = 298; + } else { + flags = pvclock_read_flags(&hv_clock_boot[0].pvti); + kvm_sched_clock_init(flags & PVCLOCK_TSC_STABLE_BIT); + } clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); pv_info.name = "KVM"; -- 2.34.1