On Wed, Sep 07, 2022, Yuan Yao wrote: > On Tue, Sep 06, 2022 at 09:26:33PM -0700, Mingwei Zhang wrote: > > > > @@ -10700,6 +10706,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu) > > > > if (kvm_cpu_has_pending_timer(vcpu)) > > > > kvm_inject_pending_timer_irqs(vcpu); > > > > > > > > + if (vcpu->arch.nested_get_pages_pending) { > > > > + r = kvm_get_nested_state_pages(vcpu); > > > > + if (r <= 0) > > > > + break; > > > > + } > > > > + > > > > > > Will this leads to skip the get_nested_state_pages for L2 first time > > > vmentry in every L2 running iteration ? Because with above changes > > > KVM_REQ_GET_NESTED_STATE_PAGES is not set in > > > nested_vmx_enter_non_root_mode() and > > > vcpu->arch.nested_get_pages_pending is not checked in > > > vcpu_enter_guest(). > > > > > Good catch. I think the diff won't work when vcpu is runnable. It works, but it's inefficient if the request comes from KVM_SET_NESTED_STATE. The pending KVM_REQ_UNBLOCK that comes with the flag will prevent actually running the guest. Specifically, this chunk of code will detect the pending request and bail out of vcpu_enter_guest(). if (kvm_vcpu_exit_request(vcpu)) { vcpu->mode = OUTSIDE_GUEST_MODE; smp_wmb(); local_irq_enable(); preempt_enable(); kvm_vcpu_srcu_read_lock(vcpu); r = 1; goto cancel_injection; } But the inefficiency is a non-issue since "true" emulation of VM-Enter will flow through this path (the VMRESUME/VMLAUNCH/VMRUN exit handler runs at the end of vcpu_enter_guest(). > > It only tries to catch the vcpu block case. Even for the vcpu block case, > > the check of KVM_REQ_UNBLOCK is way too late. Ah, kvm_vcpu_check_block() is > > called by kvm_vcpu_block() which is called by vcpu_block(). The warning is > > triggered at the very beginning of vcpu_block(), i.e., within > > kvm_arch_vcpu_runnable(). So, please ignore the trace in my previous email. > > > > In addition, my minor push back for that is > > vcpu->arch.nested_get_pages_pending seems to be another > > KVM_REQ_GET_NESTED_STATE_PAGES. > > Yeah, but in concept level it's not a REQ mask lives in the > vcpu->requests which can be cached by e.g. kvm_request_pending(). > It's necessary to check vcpu->arch.nested_get_pages_pending in > vcpu_enter_guest() if Sean's idea is to replace > KVM_REQ_GET_NESTED_STATE_PAGES with nested_get_pages_pending. Yes, they key is that it's not a request. Requests have implicit properties: e.g. as above, effectively prevent running the vCPU until the request goes away, they can be pended from other vCPUs, etc... And the property that is most relevant to this bug: except for special cases, requests only need to be serviced before running vCPU. And the number of requests is limited due to them being stored in a bitmap. x86 still has plenty of room due to kvm_vcpu.requests being a u64, but it's still preferable to avoid using a request unless absolutely necessary. For this case, since using a request isn't strictly needed and using a request would require special casing that request, my strong preference is to not use a request. So yes, my idea is to "just" replace the request with a flag, but there are subtly quite a few impliciations in not using a request.