On Fri, Oct 22, 2021 at 11:46:17AM +0200, Paolo Bonzini wrote: > On 22/10/21 04:59, Hou Wenlong wrote: > >When KVM_CAP_X86_USER_SPACE_MSR cap is enabled, userspace can control > >MSR accesses. In normal scenario, RDMSR/WRMSR can be interceped, but > >when kvm.force_emulation_prefix is enabled, RDMSR/WRMSR with kvm prefix > >would trigger an UD and cause instruction emulation. If MSR accesses is > >filtered, em_rdmsr()/em_wrmsr() returns X86EMUL_IO_NEEDED, but it is > >ignored by x86_emulate_instruction(). Then guest continues execution, > >but RIP has been updated to point to RDMSR/WRMSR in handle_ud(), so > >RDMSR/WRMSR can be interceped and guest exits to userspace finnaly by > >mistake. Such behaviour leads to two vm exits and wastes one instruction > >emulation. > > > >After let x86_emulate_instruction() returns 0 for RDMSR/WRMSR emulation, > >if it needs to exit to userspace, its complete_userspace_io callback > >would call kvm_skip_instruction() to skip instruction. But for vmx, > >VMX_EXIT_INSTRUCTION_LEN in vmcs is invalid for UD, it can't be used to > >update RIP, kvm_emulate_instruction() should be used instead. As for > >svm, nRIP in vmcb is 0 for UD, so kvm_emulate_instruction() is used. > >But for nested svm, I'm not sure, since svm_check_intercept() would > >change nRIP. > > Hi, can you provide a testcase for this bug using the > tools/testing/selftests/kvm framework? > > Thanks, > > Paolo Hi, Paolo There is already a testcase in kvm selftests (test_msr_filter_allow() in tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c), which is mentioned in Patch 2. In that testcase, it tests MSR accesses emulation with kvm.force_emulation_prefix enabled, and it is passed. But I think the logic may be not right. As I explained in Patch 2, x86_emulate_instruction() ignored X86EMUL_IO_NEEDED, so guest would continue execution, but RIP had been updated to point to RDMSR/WRMSR in handle_ud(). Then RDMSR/WRMSR would be intercepted and guest could exit to userspace later. Although the final result seemed to be right, it wasted the instruction emulation in the first vm exit.