On Thu, 2021-02-25 at 17:41 +0200, Maxim Levitsky wrote: > clone of "kernel-starship-5.11" > > Maxim Levitsky (4): > KVM: x86: determine if an exception has an error code only when > injecting it. > KVM: x86: mmu: initialize fault.async_page_fault in walk_addr_generic > KVM: x86: pending exception must be be injected even with an injected > event > kvm: WIP separation of injected and pending exception > > arch/x86/include/asm/kvm_host.h | 23 +- > arch/x86/include/uapi/asm/kvm.h | 14 +- > arch/x86/kvm/mmu/paging_tmpl.h | 1 + > arch/x86/kvm/svm/nested.c | 57 +++-- > arch/x86/kvm/svm/svm.c | 8 +- > arch/x86/kvm/vmx/nested.c | 109 +++++---- > arch/x86/kvm/vmx/vmx.c | 14 +- > arch/x86/kvm/x86.c | 377 +++++++++++++++++++------------- > arch/x86/kvm/x86.h | 6 +- > include/uapi/linux/kvm.h | 1 + > 10 files changed, 374 insertions(+), 236 deletions(-) > > -- > 2.26.2 > git-publish ate the cover letter, so here it goes: RFC/WIP: KVM: separate injected and pending exception + few more fixes This is a result of my deep dive on why do we need special .inject_page_fault for cases when TDP paging is disabled on the host for running nested guests. First 3 patches fix relatively small issues I found. Some of them can be squashed with patch 4 assuming that it is accepted. Patch 4 is WIP and I would like to hear your feedback on it: Basically the issue is that during delivery of one exception we (emulator or mmu) can signal another exception, and if the new exception is intercepted by the nested guest, we should do VM exit with former exception signaled in exitintinfo (or equivalent IDT_VECTORING_INFO_FIELD) We sadly either loose the former exception and signal an VM exit, or deliver a #DF since we only store either pending or injected exception and we merge them in kvm_multiple_exception although we shouldn't. Only later we deliver the VM exit in .check_nested_events when already wrong data is in the pending/injected exception. There are multiple ways to fix it, and I choose somewhat hard but I think the most correct way of dealing with it. 1. I split pending and injected exceptions in kvm_vcpu_arch thus allowing both to co-exist. 2. I made kvm_multiple_exception avoid merging exceptions, but instead only setup either pending or injected exception (there is another bug that we don't deliver triple fault as nested vm exit, which I'll fix later) 3. I created kvm_deliver_pending_exception which its goal is to convert the pending exception to injected exception or deliver a VM exit with both pending and injected exception/interrupt/nmi. It itself only deals with non-vmexit cases while it calls a new 'kvm_x86_ops.nested_ops->deliver_exception' to deliver exception VM exit if needed. The later implementation is simple as it just checks if we should VM exit and then delivers both exceptions (or interrupt and exception, in case interrupt delivery was interrupted by exception). This new callback returns 0 if it had delivered this VM exit, 0 if no vm exit is needed, or -EBUSY when nested run is pending, in which case the exception delivery will be retried after nested run is done. kvm_deliver_pending_exception is called each time we inject pending events and all exception related code is removed from .check_nested_events which now only deals with pending interrupts and events such as INIT,NMI,SMI, etc. New KVM cap is added to expose both pending and injected exception via KVM_GET_VCPU_EVENTS/KVM_SET_VCPU_EVENTS If this cap is not enabled, and we have both pending and injected exception when KVM_GET_VCPU_EVENTS is called, the exception is delivered. The code was tested with SVM, and it currently seems to pass all the tests I usually do (including nested migration). KVM unit tests seem to pass as well. I still almost sure that I broke something since this is far from trivial change, therefore this is RFC/WIP. Also VMX side was not yet tested other than basic compile and I am sure that there are at least few issues that remain to be fixed. I should also note that with these patches I can boot nested guests with npt=0 without any changes to .inject_page_fault. I also wrote 2 KVM unit tests to test for this issue, and for similar issue when interrupt is lost when delivery of it causes exception. These tests pass now. Best regards, Maxim Levitsky