On Tue, 2021-05-11 at 15:44 +0300, Maxim Levitsky wrote: > On Mon, 2021-05-10 at 16:08 +0000, Stamatis, Ilias wrote: > > On Mon, 2021-05-10 at 16:54 +0300, Maxim Levitsky wrote: > > > On Thu, 2021-05-06 at 10:32 +0000, ilstam@xxxxxxxxxxx wrote: > > > > From: Ilias Stamatis <ilstam@xxxxxxxxxx> > > > > > > > > Calculating the current TSC offset is done differently when nested TSC > > > > scaling is used. > > > > > > > > Signed-off-by: Ilias Stamatis <ilstam@xxxxxxxxxx> > > > > --- > > > > arch/x86/kvm/vmx/vmx.c | 26 ++++++++++++++++++++------ > > > > 1 file changed, 20 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > > index 49241423b854..df7dc0e4c903 100644 > > > > --- a/arch/x86/kvm/vmx/vmx.c > > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > > @@ -1797,10 +1797,16 @@ static void setup_msrs(struct vcpu_vmx *vmx) > > > > vmx_update_msr_bitmap(&vmx->vcpu); > > > > } > > > > > > > > -static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > > > > +/* > > > > + * This function receives the requested offset for L1 as an argument but it > > > > + * actually writes the "current" tsc offset to the VMCS and returns it. The > > > > + * current offset might be different in case an L2 guest is currently running > > > > + * and its VMCS02 is loaded. > > > > + */ > > > > > > (Not related to this patch) It might be a good idea to rename this callback > > > instead of this comment, but I am not sure about it. > > > > > > > Yes! I was planning to do this on v2 anyway as the name of that function > > is completely misleading/wrong IMHO. > > > > I think that even the comment inside it that explains that when L1 > > doesn't trap WRMSR then L2 TSC writes overwrite L1's TSC is misplaced. > > It should go one or more levels above I believe. > > > > This function simply > > updates the TSC offset in the current VMCS depending on whether L1 or L2 > > is executing. > > > > > > +static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset) > > > > { > > > > struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > > > > - u64 g_tsc_offset = 0; > > > > + u64 cur_offset = l1_offset; > > > > > > > > /* > > > > * We're here if L1 chose not to trap WRMSR to TSC. According > > > > @@ -1809,11 +1815,19 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > > > > * to the newly set TSC to get L2's TSC. > > > > */ > > > > if (is_guest_mode(vcpu) && > > > > - (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)) > > > > - g_tsc_offset = vmcs12->tsc_offset; > > > > + (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)) { > > > > + if (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_TSC_SCALING) { > > > > + cur_offset = kvm_compute_02_tsc_offset( > > > > + l1_offset, > > > > + vmcs12->tsc_multiplier, > > > > + vmcs12->tsc_offset); > > > > + } else { > > > > + cur_offset = l1_offset + vmcs12->tsc_offset; > > > > + } > > > > + } > > > > > > > > - vmcs_write64(TSC_OFFSET, offset + g_tsc_offset); > > > > - return offset + g_tsc_offset; > > > > + vmcs_write64(TSC_OFFSET, cur_offset); > > > > + return cur_offset; > > > > } > > > > > > > > /* > > > > > > This code would be ideal to move to common code as SVM will do basically > > > the same thing. > > > Doesn't have to be done now though. > > > > > > > Hmm, how can we do the feature availability checking in common code? > > We can add a vendor callback for this. > > Just a few thoughts about how I think we can implement > the nested TSC scaling in (mostly) common code: > > > Assuming that the common code knows that: > 1. Nested guest is running (already the case) > > 2. The format of the scaling multiplier is known > (thankfully both SVM and VMX use fixed point binary number. > > SVM is using 8.32 format and VMX using 16.48 format. > > The common code already knows this via > kvm_max_tsc_scaling_ratio/kvm_tsc_scaling_ratio_frac_bits. > > 3. the value of nested TSC scaling multiplier > is known to the common code. > > (a callback to VMX/SVM code to query the TSC scaling value, > and have it return 1 when TSC scaling is disabled should work) > I suppose you mean return kvm_default_tsc_scaling_ratio > > Then the common code can do the whole thing, and only > let the SVM/VMX code write the actual multiplier. > > As far as I know on the SVM, the TSC scaling works like that: > > 1. SVM has a CPUID bit to indicate that tsc scaling is supported. > (X86_FEATURE_TSCRATEMSR) > > When this bit is set, TSC scale ratio is unconditionally enabled (but > can be just 1), and it is set via a special MSR (MSR_AMD64_TSC_RATIO) > rather than a field in VMCB (someone at AMD did cut corners...). > > However since the TSC scaling is only effective when a guest is running, > that MSR can be treated almost as if it was just another VMCB field. > > The TSC scale value is 32 bit fraction and another 8 bits the integer value > (as opposed to 48 bit fraction on VMX and 16 bits integer value). > > I don't think that there are any other differences. > > I should also note that I can do all of the above myself if > I end up implementing the nested TSC scaling on AMD > so I don't object much to the way that this patch series is done. > That's fine, I will add the callbacks and move everything to common code. And then you can fill the svm-specific bits if you want. Thanks, Ilias