On 9/18/2024 5:37 PM, Sean Christopherson wrote: > On Mon, Sep 16, 2024, Nikunj A. Dadhania wrote: >> On 9/13/2024 11:00 PM, Sean Christopherson wrote: >>>> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> >>>> Tested-by: Peter Gonda <pgonda@xxxxxxxxxx> >>>> --- >>>> arch/x86/kernel/kvmclock.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c >>>> index 5b2c15214a6b..3d03b4c937b9 100644 >>>> --- a/arch/x86/kernel/kvmclock.c >>>> +++ b/arch/x86/kernel/kvmclock.c >>>> @@ -289,7 +289,7 @@ void __init kvmclock_init(void) >>>> { >>>> u8 flags; >>>> >>>> - if (!kvm_para_available() || !kvmclock) >>>> + if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SECURE_TSC)) >>> >>> I would much prefer we solve the kvmclock vs. TSC fight in a generic way. Unless >>> I've missed something, the fact that the TSC is more trusted in the SNP/TDX world >>> is simply what's forcing the issue, but it's not actually the reason why Linux >>> should prefer the TSC over kvmclock. The underlying reason is that platforms that >>> support SNP/TDX are guaranteed to have a stable, always running TSC, i.e. that the >>> TSC is a superior timesource purely from a functionality perspective. That it's >>> more secure is icing on the cake. >> >> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock >> should be disabled assuming that timesource is stable and always running? > > No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC > is stable, irrespective of SNP or TDX. This is effectively already done for the > timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if > invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the > kvm_sched_clock_init() code. The kvm-clock and tsc-early both are having the rating of 299. As they are of same rating, kvm-clock is being picked up first. Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will be picked up instead. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index fafdbf813ae3..1982cee74354 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -289,7 +289,7 @@ void __init kvmclock_init(void) { u8 flags; - if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC)) + if (!kvm_para_available() || !kvmclock) return; if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { @@ -342,7 +342,7 @@ void __init kvmclock_init(void) if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && !check_tsc_unstable()) - kvm_clock.rating = 299; + kvm_clock.rating = 298; clocksource_register_hz(&kvm_clock, NSEC_PER_SEC); pv_info.name = "KVM"; [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 [ 0.000001] kvm-clock: using sched offset of 6630881179920185 cycles [ 0.001266] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.005347] tsc: Detected 1996.247 MHz processor [ 0.263100] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398caa9ddcb, max_idle_ns: 881590739785 ns [ 0.980456] clocksource: Switched to clocksource tsc-early [ 1.059332] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398caa9ddcb, max_idle_ns: 881590739785 ns [ 1.062094] clocksource: Switched to clocksource tsc > The other aspect of this to consider is wallclock. If I'm reading the code > correctly, _completely_ disabling kvmclock will case the kernel to keep using the > RTC for wallclock. Using the RTC is an amusingly bad decision for SNP and TDX > (and regular VMs), as the RTC is a slooow emulation path and it's still very much > controlled by the untrusted host. Right, this is not expected. > Unless you have a better idea for what to do with wallclock, I think the right > approach is to come up a cleaner way to prefer TSC over kvmclock for timekeeping > and the scheduler, but leave wallclock as-is. And then for SNP and TDX, "assert" > that the TSC is being used instead of kvmclock. Presumably, all SNP and TDX > hosts provide a stable TSC, so there's probably no reason for the guest to even > care if the TSC is "secure". > > Note, past me missed the wallclock side of things[*], so I don't think hiding > kvmclock entirely is the best solution. > > [*] https://lore.kernel.org/all/ZOjF2DMBgW%2FzVvL3@xxxxxxxxxx Regards Nikunj