Re: [PATCH v11 19/20] x86/kvmclock: Skip kvmclock when Secure TSC is available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/18/2024 5:37 PM, Sean Christopherson wrote:
> On Mon, Sep 16, 2024, Nikunj A. Dadhania wrote:
>> On 9/13/2024 11:00 PM, Sean Christopherson wrote:
>>>> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx>
>>>> Tested-by: Peter Gonda <pgonda@xxxxxxxxxx>
>>>> ---
>>>>  arch/x86/kernel/kvmclock.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>>>> index 5b2c15214a6b..3d03b4c937b9 100644
>>>> --- a/arch/x86/kernel/kvmclock.c
>>>> +++ b/arch/x86/kernel/kvmclock.c
>>>> @@ -289,7 +289,7 @@ void __init kvmclock_init(void)
>>>>  {
>>>>  	u8 flags;
>>>>  
>>>> -	if (!kvm_para_available() || !kvmclock)
>>>> +	if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SECURE_TSC))
>>>
>>> I would much prefer we solve the kvmclock vs. TSC fight in a generic way.  Unless
>>> I've missed something, the fact that the TSC is more trusted in the SNP/TDX world
>>> is simply what's forcing the issue, but it's not actually the reason why Linux
>>> should prefer the TSC over kvmclock.  The underlying reason is that platforms that
>>> support SNP/TDX are guaranteed to have a stable, always running TSC, i.e. that the
>>> TSC is a superior timesource purely from a functionality perspective.  That it's
>>> more secure is icing on the cake.
>>
>> Are you suggesting that whenever the guest is either SNP or TDX, kvmclock
>> should be disabled assuming that timesource is stable and always running?
> 
> No, I'm saying that the guest should prefer the raw TSC over kvmclock if the TSC
> is stable, irrespective of SNP or TDX.  This is effectively already done for the
> timekeeping base (see commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if
> invariant TSC is exposed")), but the scheduler still uses kvmclock thanks to the
> kvm_sched_clock_init() code.

The kvm-clock and tsc-early both are having the rating of 299. As they are of
same rating, kvm-clock is being picked up first.

Is it fine to drop the clock rating of kvmclock to 298 ? With this tsc-early will
be picked up instead.

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index fafdbf813ae3..1982cee74354 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -289,7 +289,7 @@ void __init kvmclock_init(void)
 {
 	u8 flags;
 
-	if (!kvm_para_available() || !kvmclock || cc_platform_has(CC_ATTR_GUEST_SNP_SECURE_TSC))
+	if (!kvm_para_available() || !kvmclock)
 		return;
 
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
@@ -342,7 +342,7 @@ void __init kvmclock_init(void)
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    !check_tsc_unstable())
-		kvm_clock.rating = 299;
+		kvm_clock.rating = 298;
 
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.name = "KVM";



[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000001] kvm-clock: using sched offset of 6630881179920185 cycles
[    0.001266] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.005347] tsc: Detected 1996.247 MHz processor
[    0.263100] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x398caa9ddcb, max_idle_ns: 881590739785 ns
[    0.980456] clocksource: Switched to clocksource tsc-early
[    1.059332] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x398caa9ddcb, max_idle_ns: 881590739785 ns
[    1.062094] clocksource: Switched to clocksource tsc

> The other aspect of this to consider is wallclock.  If I'm reading the code
> correctly, _completely_ disabling kvmclock will case the kernel to keep using the
> RTC for wallclock.  Using the RTC is an amusingly bad decision for SNP and TDX
> (and regular VMs), as the RTC is a slooow emulation path and it's still very much
> controlled by the untrusted host.

Right, this is not expected.
 
> Unless you have a better idea for what to do with wallclock, I think the right
> approach is to come up a cleaner way to prefer TSC over kvmclock for timekeeping
> and the scheduler, but leave wallclock as-is.  And then for SNP and TDX, "assert"
> that the TSC is being used instead of kvmclock.  Presumably, all SNP and TDX
> hosts provide a stable TSC, so there's probably no reason for the guest to even
> care if the TSC is "secure".
> 
> Note, past me missed the wallclock side of things[*], so I don't think hiding
> kvmclock entirely is the best solution.
> 
> [*] https://lore.kernel.org/all/ZOjF2DMBgW%2FzVvL3@xxxxxxxxxx

Regards
Nikunj




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux