On Mon, Feb 3, 2025 at 5:35 PM Doug Covelli <doug.covelli@xxxxxxxxxxxx> wrote: > OK. It seems like fully embracing the in-kernel APIC is the way to go > especially considering it really simplifies using KVM's support for nested > virtualization. Speaking of nested virtualization we have been working on > adding support for that and would like to propose a couple of changes: > > - Add an option for L0 to handle backdoor accesses from CPL3 code running in L2. > On a #GP nested_vmx_l0_wants_exit can check if this option is enabled and KVM > can handle the #GP like it would if it had been from L1 (exit to userlevel iff > it is a backdoor access otherwwise deliver the fault to L2). When combined with > enable_vmware_backdoor this will allow L0 to optionally handle backdoor accesses > from CPL3 code running in L2. This is needed for cases such as running VMware > tools in a Windows VM with VBS enabled. For other cases such as running tools > in a Windows VM in an ESX VM we still want L1 to handle the backdoor accesses > from L2. I think this makes sense and could be an argument to KVM_ENABLE_CAP. > - Extend KVM_EXIT_MEMORY_FAULT for permission faults (e.g the guest attempting > to write to a page that has been protected by userlevel calling mprotect). This > is useful for cases where we want synchronous detection of guest writes such as > lazy snapshots (dirty page tracking is no good for this case). Currently > permission faults result in KVM_RUN returning EFAULT which we handle by > interpreting the instruction as we do not know the guest physical address > associated with the fault. Yes, this makes sense too, though you might want to look into userfaultfd as well. We had something planned using attributes, but I don't see any issue extending it to EFAULT. Maybe it would have to be yet another KVM_ENABLE_CAP; considering that it would break your existing code, there might be someone else in the wild doing it. Paolo