On Tue, 2021-05-25 at 15:58 +0000, Sean Christopherson wrote: > On Tue, May 25, 2021, Stamatis, Ilias wrote: > > On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote: > > > Yes, but its existence is a complete hack. vmx->current_tsc_ratio has the same > > > scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch. Unlike > > > per-VMCS tracking, it should not be useful, keyword "should". > > > > > > What I meant by my earlier comment: > > > > > > Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to > > > earlier", which is all kinds of wrong. > > > > > > is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER. The correct > > > behavior is to set the field at VMCS initialization, and then immediately set it > > > whenever the ratio is changed, e.g. on nested transition, from userspace, etc... > > > In other words, my unclear feedback was to make it obsolete (and drop it) by > > > fixing the underlying mess, not to just drop the optimization hack. > > > > I understood this and replied earlier. The right place for the hw multiplier > > field to be updated is inside set_tsc_khz() in common code when the ratio > > changes. However, this requires adding another vendor callback etc. As all > > this is further refactoring I believe it's better to leave this series as is - > > ie only touching code that is directly related to nested TSC scaling and not > > try to do everything as part of the same series. > > But it directly impacts your code, e.g. the nested enter/exit flows would need > to dance around the decache silliness. And I believe it even more directly > impacts this series: kvm_set_tsc_khz() fails to handle the case where userspace > invokes KVM_SET_TSC_KHZ while L2 is active. Good catch! > > > This makes testing easier too. > > Hmm, sort of. Yes, the fewer patches/modifications in a series definitely makes > the series itself easier to test. But stepping back and looking at the total > cost of testing, I would argue that punting related changes to a later time > increases the overall cost. E.g. if someone else picks up the clean up work, > then they have to redo most, if not all, of the testing that you are already > doing, including getting access to the proper hardware, understanding what tests > to prioritize, etc... Whereas adding one more patch to your series is an > incremental cost since you already have the hardware setup, know which tests to > run, etc... > > > We can still implement these changes later. > > We can, but we shouldn't. Simply dropping vmx->current_tsc_ratio is not an > option; it knowingly introduces a (minor) performance regression, for no reason > other than wanting to avoid code churn. Piling more stuff on top of the flawed > decache logic is impolite, as it adds more work for the person that ends up > doing the cleanup. I would 100% agree if this were a significant cleanup and/or > completely unrelated, but IMO that's not the case. > > Compile tested only... Thank you. > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 029c9615378f..34ad7a17458a 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -90,6 +90,7 @@ KVM_X86_OP_NULL(has_wbinvd_exit) > KVM_X86_OP(get_l2_tsc_offset) > KVM_X86_OP(get_l2_tsc_multiplier) > KVM_X86_OP(write_tsc_offset) > +KVM_X86_OP(write_tsc_multiplier) > KVM_X86_OP(get_exit_info) > KVM_X86_OP(check_intercept) > KVM_X86_OP(handle_exit_irqoff) > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index f099277b993d..a334ce7741ab 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1308,6 +1308,7 @@ struct kvm_x86_ops { > u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); > u64 (*get_l2_tsc_multiplier)(struct kvm_vcpu *vcpu); > void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); > + void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier); > > /* > * Retrieve somewhat arbitrary exit information. Intended to be used > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index b18f60463073..914afcceb46d 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1103,6 +1103,14 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS); > } > > +static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 l1_multiplier) > +{ > + /* > + * Handled when loading guest state since the ratio is programmed via > + * MSR_AMD64_TSC_RATIO, not a field in the VMCB. > + */ > +} > + Ok, what I wanted to avoid really is having to dig into SVM code and see where exactly it sets the TSC multiplier or having to implement svm_write_tsc_multiplier as I knew AMD uses an MSR instead of a VMCB field. But if we are fine with introducing this as is above (for now) I will include this in the series, apply the other small changes suggested and re-post the patches.