Vivek Goyal <vgoyal@xxxxxxxxxx> writes: > On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote: >> > > [..] >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 0a6b35353fc7..c195f63c1086 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch { >> u64 msr_val; >> u32 id; >> bool send_user_only; >> - u32 host_apf_reason; >> + u32 host_apf_flags; > > Hi Vitaly, > > What is host_apf_reason used for. Looks like it is somehow used in > context of nested guests. I hope by now you have been able to figure > it out. > > Is it somehow the case of that L2 guest takes a page fault exit > and then L0 injects this event in L1 using exception. I have been > trying to read this code but can't wrap my head around it. > > I am still concerned about the case of nested kvm. We have discussed > apf mechanism but never touched nested part of it. Given we are > touching code in nested kvm part, want to make sure it is not broken > in new design. > Sorry I missed this. I think we've touched nested topic a bit already: https://lore.kernel.org/kvm/87lfluwfi0.fsf@xxxxxxxxxxxxxxxxxxxx/ But let me try to explain the whole thing and maybe someone will point out what I'm missing. The problem being solved: L2 guest is running and it is hitting a page which is not present *in L0* and instead of pausing *L1* vCPU completely we want to let L1 know about the problem so it can run something else (e.g. another guest or just another application). What's different between this and 'normal' APF case. When L2 guest is running, the CPU (physical) is in 'guest' mode so we can't inject #PF there. Actually, we can but L2 may get confused and we're not even sure it's L2's fault, that L2 supported APF and so on. We want to make L1 deal with the issue. How does it work then. We inject #PF and L1 sees it as #PF VMEXIT. It needs to know about APF (thus KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) but the handling is exactly the same as do_pagefault(): L1's kvm_handle_page_fault() checkes APF area (shared between L0 and L1) and either pauses a task or resumes a previously paused one. This can be a L2 guest or something else. What is 'host_apf_reason'. It is a copy of 'reason' field from 'struct kvm_vcpu_pv_apf_data' which we read upon #PF VMEXIT. It indicates that the #PF VMEXIT is synthetic. How does it work with the patchset: 'page not present' case remains the same. 'page ready' case now goes through interrupts so it may not get handled immediately. External interrupts will be handled by L0 in host mode (when L2 is not running). For the 'page ready' case L1 hypervisor doesn't need any special handling, kvm_async_pf_intr() irq handler will work correctly. I've smoke tested this with VMX and nothing immediately blew up. -- Vitaly