On Fri, Feb 25, 2022 at 10:24 PM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > > On 2/26/2022 12:53 PM, Jim Mattson wrote: > > On Fri, Feb 25, 2022 at 8:25 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote: > >> > >> On Fri, Feb 25, 2022 at 8:07 PM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > >>> > >>> On 2/25/2022 11:13 PM, Paolo Bonzini wrote: > >>>> On 2/25/22 16:12, Xiaoyao Li wrote: > >>>>>>>> > >>>>>>> > >>>>>>> I don't like the idea of making things up without notifying userspace > >>>>>>> that this is fictional. How is my customer running nested VMs supposed > >>>>>>> to know that L2 didn't actually shutdown, but L0 killed it because the > >>>>>>> notify window was exceeded? If this information isn't reported to > >>>>>>> userspace, I have no way of getting the information to the customer. > >>>>>> > >>>>>> Then, maybe a dedicated software define VM exit for it instead of > >>>>>> reusing triple fault? > >>>>>> > >>>>> > >>>>> Second thought, we can even just return Notify VM exit to L1 to tell > >>>>> L2 causes Notify VM exit, even thought Notify VM exit is not exposed > >>>>> to L1. > >>>> > >>>> That might cause NULL pointer dereferences or other nasty occurrences. > >>> > >>> IMO, a well written VMM (in L1) should handle it correctly. > >>> > >>> L0 KVM reports no Notify VM Exit support to L1, so L1 runs without > >>> setting Notify VM exit. If a L2 causes notify_vm_exit with > >>> invalid_vm_context, L0 just reflects it to L1. In L1's view, there is no > >>> support of Notify VM Exit from VMX MSR capability. Following L1 handler > >>> is possible: > >>> > >>> a) if (notify_vm_exit available & notify_vm_exit enabled) { > >>> handle in b) > >>> } else { > >>> report unexpected vm exit reason to userspace; > >>> } > >>> > >>> b) similar handler like we implement in KVM: > >>> if (!vm_context_invalid) > >>> re-enter guest; > >>> else > >>> report to userspace; > >>> > >>> c) no Notify VM Exit related code (e.g. old KVM), it's treated as > >>> unsupported exit reason > >>> > >>> As long as it belongs to any case above, I think L1 can handle it > >>> correctly. Any nasty occurrence should be caused by incorrect handler in > >>> L1 VMM, in my opinion. > >> > >> Please test some common hypervisors (e.g. ESXi and Hyper-V). > > > > I took a look at KVM in Linux v4.9 (one of our more popular guests), > > and it will not handle this case well: > > > > if (exit_reason < kvm_vmx_max_exit_handlers > > && kvm_vmx_exit_handlers[exit_reason]) > > return kvm_vmx_exit_handlers[exit_reason](vcpu); > > else { > > WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason); > > kvm_queue_exception(vcpu, UD_VECTOR); > > return 1; > > } > > > > At least there's an L1 kernel log message for the first unexpected > > NOTIFY VM-exit, but after that, there is silence. Just a completely > > inexplicable #UD in L2, assuming that L2 is resumable at this point. > > At least there is a message to tell L1 a notify VM exit is triggered in > L2. Yes, the inexplicable #UD won't be hit unless L2 triggers Notify VM > exit with invalid_context, which is malicious to L0 and L1. There is only an L1 kernel log message *the first time*. That's not good enough. And this is just one of the myriad of possible L1 hypervisors. > If we use triple_fault (i.e., shutdown), then no info to tell L1 that > it's caused by Notify VM exit with invalid context. Triple fault needs > to be extended and L1 kernel needs to be enlightened. It doesn't help > old guest kernel. > > If we use Machine Check, it's somewhat same inexplicable to L2 unless > it's enlightened. But it doesn't help old guest kernel. > > Anyway, for Notify VM exit with invalid context from L2, I don't see a > good solution to tell L1 VMM it's a "Notify VM exit with invalid context > from L2" and keep all kinds of L1 VMM happy, especially for those with > old kernel versions. I agree that there is no way to make every conceivable L1 happy. That's why the information needs to be surfaced to the L0 userspace. I contend that any time L0 kvm violates the architectural specification in its emulation of L1 or L2, the L0 userspace *must* be informed.