On Mon, 2021-05-10 at 16:52 +0300, Maxim Levitsky wrote: > On Thu, 2021-05-06 at 10:32 +0000, ilstam@xxxxxxxxxxx wrote: > > From: Ilias Stamatis <ilstam@xxxxxxxxxx> > > > > Sometimes kvm_scale_tsc() needs to use the current scaling ratio and > > other times (like when reading the TSC from user space) it needs to use > > L1's scaling ratio. Have the caller specify this by passing an > > additional boolean argument. > > > > Signed-off-by: Ilias Stamatis <ilstam@xxxxxxxxxx> > > --- > > arch/x86/include/asm/kvm_host.h | 2 +- > > arch/x86/kvm/x86.c | 21 +++++++++++++-------- > > 2 files changed, 14 insertions(+), 9 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 132e820525fb..cdddbf0b1177 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1779,7 +1779,7 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low, > > void kvm_define_user_return_msr(unsigned index, u32 msr); > > int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask); > > > > -u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc); > > +u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, bool l1); > > u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc); > > > > unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu); > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 7bc5155ac6fd..26a4c0f46f15 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -2241,10 +2241,14 @@ static inline u64 __scale_tsc(u64 ratio, u64 tsc) > > return mul_u64_u64_shr(tsc, ratio, kvm_tsc_scaling_ratio_frac_bits); > > } > > > > -u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc) > > +/* > > + * If l1 is true the TSC is scaled using L1's scaling ratio, otherwise > > + * the current scaling ratio is used. > > + */ > > +u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc, bool l1) > > { > > u64 _tsc = tsc; > > - u64 ratio = vcpu->arch.tsc_scaling_ratio; > > + u64 ratio = l1 ? vcpu->arch.l1_tsc_scaling_ratio : vcpu->arch.tsc_scaling_ratio; > > > > if (ratio != kvm_default_tsc_scaling_ratio) > > _tsc = __scale_tsc(ratio, tsc); > > I do wonder if it is better to have kvm_scale_tsc_l1 and kvm_scale_tsc instead for better > readablility? > That makes sense. Will do. > > > @@ -2257,14 +2261,14 @@ static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) > > { > > u64 tsc; > > > > - tsc = kvm_scale_tsc(vcpu, rdtsc()); > > + tsc = kvm_scale_tsc(vcpu, rdtsc(), true); > > Here we have a somewhat dangerous assumption that this function > will always be used with L1 tsc values. > > The kvm_compute_tsc_offset should at least be renamed to > kvm_compute_tsc_offset_l1 or something like that. > > Currently the assumption holds though: > > We call the kvm_compute_tsc_offset in: > > -> kvm_synchronize_tsc which is nowadays thankfully only called > on TSC writes from qemu, which are assumed to be L1 values. > > (this is pending a rework of the whole thing which I started > some time ago but haven't had a chance to finish it yet) > > -> Guest write of MSR_IA32_TSC. The value written is in L1 units, > since TSC offset/scaling only covers RDTSC and RDMSR of the IA32_TSC, > while WRMSR should be intercepted by L1 and emulated. > If it is not emulated, then L2 would just write L1 value. > > -> in kvm_arch_vcpu_load, when TSC is unstable, we always try to resume > the guest from the same TSC value as it had seen last time, > and then catchup. Yes. I wasn't sure about kvm_compute_tsc_offset but my understanding was that all of its callers wanted the L1 value scaled. Renaming it to kvm_scale_tsc_l1 sounds like a great idea. > Also host TSC values are used, and after reading this function, > I recommend to rename the vcpu->arch.last_guest_tsc > to vcpu->arch.last_guest_tsc_l1 to document this. OK > > > > return target_tsc - tsc; > > } > > > > u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) > > { > > - return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc); > > + return vcpu->arch.l1_tsc_offset + kvm_scale_tsc(vcpu, host_tsc, true); > > OK > > } > > EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); > > > > @@ -2395,9 +2399,9 @@ static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, > > > > static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment) > > { > > - if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio) > > + if (vcpu->arch.l1_tsc_scaling_ratio != kvm_default_tsc_scaling_ratio) > > WARN_ON(adjustment < 0); > > This should belong to patch 2 IMHO. > Right, I will move it. > > - adjustment = kvm_scale_tsc(vcpu, (u64) adjustment); > > + adjustment = kvm_scale_tsc(vcpu, (u64) adjustment, true); > > OK > > adjust_tsc_offset_guest(vcpu, adjustment); > > } > > > > @@ -2780,7 +2784,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > > /* With all the info we got, fill in the values */ > > > > if (kvm_has_tsc_control) > > - tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz); > > + tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz, true); > > OK (kvmclock is for L1 only, L1 hypervisor is free to expose its own kvmclock to L2) > > > > if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) { > > kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL, > > @@ -3474,7 +3478,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset : > > vcpu->arch.tsc_offset; > > > > - msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset; > > + msr_info->data = kvm_scale_tsc(vcpu, rdtsc(), > > + msr_info->host_initiated) + tsc_offset; > > Since we now do two things that depend on msr_info->host_initiated, I > think I would prefer to convert this back to regular 'if'. > I don't have a strong opinion on this though. > Agreed. Thanks! Ilias > > > break; > > } > > case MSR_MTRRcap: > > > Best regards, > Maxim Levitsky > >