On 11/11/19 13:30, Liran Alon wrote: > When L1 don't use TPR-Shadow to run L2, L0 configures vmcs02 without > TPR-Shadow and install intercepts on CR8 access (load and store). > > If L1 do not intercept L2 CR8 access, L0 intercepts on those accesses > will emulate load/store on L1's LAPIC TPR. If in this case L2 lowers > TPR such that there is now an injectable interrupt to L1, > apic_update_ppr() will request a KVM_REQ_EVENT which will trigger a call > to update_cr8_intercept() to update TPR-Threshold to highest pending IRR > priority. > > However, this update to TPR-Threshold is done while active vmcs is > vmcs02 instead of vmcs01. Thus, when later at some point L0 will > emulate an exit from L2 to L1, L1 will still run with high > TPR-Threshold. This will result in every VMEntry to L1 to immediately > exit on TPR_BELOW_THRESHOLD and continue to do so infinitely until > some condition will cause KVM_REQ_EVENT to be set. > (Note that TPR_BELOW_THRESHOLD exit handler do not set KVM_REQ_EVENT > until apic_update_ppr() will notice a new injectable interrupt for PPR) > > To fix this issue, change update_cr8_intercept() such that if L2 lowers > L1's TPR in a way that requires to lower L1's TPR-Threshold, save update > to TPR-Threshold and apply it to vmcs01 when L0 emulates an exit from > L2 to L1. Can you explain why the write shouldn't be done to vmcs02 as well? Paolo > - vmcs_write32(TPR_THRESHOLD, tpr_threshold); > + > + if (is_guest_mode(vcpu)) > + to_vmx(vcpu)->nested.l1_tpr_threshold = tpr_threshold; > + else > + vmcs_write32(TPR_THRESHOLD, tpr_threshold); > } > > void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > index bee16687dc0b..43331dfafffe 100644 > --- a/arch/x86/kvm/vmx/vmx.h > +++ b/arch/x86/kvm/vmx/vmx.h > @@ -167,6 +167,9 @@ struct nested_vmx { > u64 vmcs01_debugctl; > u64 vmcs01_guest_bndcfgs; > > + /* to migrate it to L1 if L2 writes to L1's CR8 directly */ > + int l1_tpr_threshold; > + > u16 vpid02; > u16 last_vpid; > >