On Fri, Sep 23, 2022, Maxim Levitsky wrote: > On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote: > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 2c96c43c313a..6475c882b359 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit { > > * AVIC is disabled because SEV doesn't support it. > > */ > > APICV_INHIBIT_REASON_SEV, > > + > > + /* > > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be > > + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully > > + * independent controls for AVIC vs. x2AVIC, and also because SVM > > + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not > > + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately. > > + * AVIC can still be activated, but KVM must not create SPTEs for the > > + * APIC base. For simplicity, this is sticky. > > + */ > > + APICV_INHIBIT_REASON_X2APIC, > > Hi Sean! > > So assuming that I won't object to making it SVM specific (I still think > that VMX should also inhibit this memslot because this is closer to x86 spec, > but if you really want it this way, I won't fight over it): Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling reason to change KVM's behavior and risk silently causing a performance regression. If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if this were the initial APICv implementation and thus no possible users, then I would probably also vote to give APICv the same treatment. > I somewhat don't like this inhibit, because now it is used just to say > 'I am AVIC'. > > What do you think if you just move the code that removes the memslot to SVM, > to avic_set_virtual_apic_mode? Suffers the same SRCU issue (see below) :-/ Given the SRCU problem, I'd prefer to keep the management of the memslot in common code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit for dealing with the SRCU issue, i.e. handling this in AVIC code would require another hook on top of spreading the memslot management across x86 and SVM code. > > @@ -1169,10 +1180,11 @@ struct kvm_arch { > > struct kvm_apic_map __rcu *apic_map; > > atomic_t apic_map_dirty; > > > > - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */ > > - struct rw_semaphore apicv_update_lock; > > - > > bool apic_access_memslot_enabled; > > + bool apic_access_memslot_inhibited; > > So the apic_access_memslot_enabled currently tracks if the memslot is enabled. > As I see later in the patch when you free the memslot, you set it to false, > which means that if a vCPU is created after that (it can happen in theory), > the memslot will be created again :( > > I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set > enabled to false, and then on next vcpu run, that will free the memslot. > > when enabled == false, the code needs to be changed to not allocate it again. This should be handled already. apic_access_memslot_enabled is toggled from true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled" flag is protected by slots_lock. Thus, newly created vCPUs are guaranteed to either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true. int kvm_alloc_apic_access_page(struct kvm *kvm) { struct page *page; void __user *hva; int ret = 0; mutex_lock(&kvm->slots_lock); if (kvm->arch.apic_access_memslot_enabled || kvm->arch.apic_access_memslot_inhibited) <=== prevents reallocation goto out; out: mutex_unlock(&kvm->slots_lock); return ret; } That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e. void kvm_free_apic_access_page(struct kvm *kvm) { WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited); mutex_lock(&kvm->slots_lock); if (kvm->arch.apic_access_memslot_enabled) { __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0); kvm->arch.apic_access_memslot_enabled = false; } mutex_unlock(&kvm->slots_lock); } > > + > > + /* Protects apicv_inhibit_reasons */ > > + struct rw_semaphore apicv_update_lock; > > unsigned long apicv_inhibit_reasons; > > > > gpa_t wall_clock; > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index 99994d2470a2..70f00eda75b2 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) > > } > > } > > > > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) > > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) { > > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id); > > > > + /* > > + * Mark the APIC memslot as inhibited if x2APIC is enabled and > > + * the x2APIC inhibit is required. The actual deletion of the > > + * memslot is handled by vcpu_run() as SRCU may or may not be > > + * held at this time, i.e. updating memslots isn't safe. Don't > > + * check apic_access_memslot_inhibited, this vCPU needs to > > + * ensure the memslot is deleted before re-entering the guest, > > + * i.e. needs to make the request even if the inhibit flag was > > + * already set by a different vCPU. > > + */ > > + if (vcpu->kvm->arch.apic_access_memslot_enabled && > > + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) { > > + vcpu->kvm->arch.apic_access_memslot_inhibited = true; > > + kvm_make_request(KVM_REQ_UNBLOCK, vcpu); > > You are about to remove the KVM_REQ_UNBLOCK in other patch series. No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it has a rather weird name, e.g. KVM_REQ_WORK would probably be better. > How about just raising KVM_REQ_APICV_UPDATE on current vCPU > and having a special case in kvm_vcpu_update_apicv of > > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) { > drop srcu lock This was my initial thought as well, but the issue is that SRCU may or may not be held, and so the unlock+lock would need to be conditional. That's technically a solvable problem, as it's possible to detect if SRCU is held, but I really don't want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't screw up SRCU. > free the memslot > take srcu lock > }