On Tue, Mar 04, 2025 at 10:01:25PM -0500, Maxim Levitsky wrote: > On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote: > > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote: > > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote: > > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to > > > > L2. This is unnecessary because it should always be > > > > TLB_CONTROL_DO_NOTHING at this point. > > > > > > > > The flow for setting TLB_CONTROL is as follows: > > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to > > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid(). > > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to > > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID. > > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the > > > > guest is run. > > > > > > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it > > > > again on every nested transition to L2. > > > > > > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset > > > > crushing pending TLB flushes. Remove it, as the reset is not really > > > > crushing anything as explained above. > > > > > > I am not sure that we don't crush a pending tlb request: > > > > > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH > > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush > > > can crush this request. > > > > How so? > > > > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request. > > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests. > > I am probably missing something but: > > Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening, > but nested state is allocated. > > (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything') > > In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8), > and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs. > In particular it will be set in vmcb02. > > Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control, > and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02. > > I think that this is what the removed comment referred to. When KVM_REQ_TLB_FLUSH is raised, we do not call svm_flush_tlb_all() immediately. We only call svm_flush_tlb_all() when the request is serviced in vcpu_enter_guest(): /* * Note, the order matters here, as flushing "all" TLB entries * also flushes the "current" TLB entries, i.e. servicing the * flush "all" will clear any request to flush "current". */ if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) kvm_vcpu_flush_tlb_all(vcpu); kvm_service_local_tlb_flush_requests(vcpu); IIUC, vcpu_enter_guest() will be called for L2 after nested_vmcb02_prepare_control() is called. My understanding is that the sequence of events is as follows: - L1 executes VMRUN, which is trapped and emulated by L0. - KVM executes handles the VM-exit in L0 by calling vmrun_interception() to setup VMCB02 in prepartion for running L2. This will call nested_svm_vmrun() -> enter_svm_guest_mode() -> nested_vmcb02_prepare_control() (setting tlb_ctl to TLB_CONTROL_DO_NOTHING). - Execution will pick up after the VMRUN instruction in L0, eventually getting to the loop in vcpu_run(), and calling vcpu_enter_guest() for L2. At this point any pending TLB flush requests (e.g. KVM_REQ_TLB_FLUSH) will be handled, and svm_flush_tlb_*() functions may be called to set tlb_ctl to a non-zero value (e.g. TLB_CONTROL_FLUSH_ASID). - A little bit later in svm_vcpu_run() -> pre_svm_run(), tlb_ctl may be upgraded to TLB_CONTROL_FLUSH_ALL_ASID if a new ASID is allocated. - After the guest is run, svm_cpu_run() resets tlb_ctl to TLB_CONTROL_DO_NOTHING. So nested_vmcb02_prepare_control() setting tlb_ctl to TLB_CONTROL_DO_NOTHING should have no effect because tlb_ctl is set after that anyway before L2 is run, and reset back to TLB_CONTROL_DO_NOTHING after L2 is run. I tried to clarify this in the commit log, but I don't think it is clear enough. I will try to add more details in the next version. Please correct me if I am wrong.