On 2018-06-09 07:48, Liran Alon wrote: > So I think the path to how to further bisect bug is very clear here: > 1) First, attempt to change rsm_interception() to pass > EMULTYPE_NO_REEXECUTE and see if it makes a difference. (BTW, you can > submit a commit that adds this EMULTYPE_NO_REEXECUTE as it should be > present here) 2) If that doesn't work, attempt to remove > rsm_ins_bytes and instead pass NULL. If this works, this means that > there are cases which raise RSM interception on bytes different than > "\x0f\xaa". Neither of those help. > Anyway, having a look at> # echo 1 >/sys/kernel/debug/tracing/events/kvm/enable > # cat /sys/kernel/debug/tracing/trace_pipe > Should help debug the issue in case you discover this patch wasn't > the root-cause. Tracing further, it seems the issue is that L0 is completely unaware of when L2 enters SMM mode. It doesn't know about the right SMBASE, it doesn't know about whether L2 is in SMM or not. The emulation delivers a #UD and then L2 triggers a shutdown: d..1 27652.855246: kvm_entry: vcpu 4 .... 27652.855248: kvm_exit: reason rsm rip 0xfd399 info 0 0 .... 27652.855248: kvm_nested_vmexit: rip: 0x00000000000fd399 reason: rsm ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 .... 27652.855254: kvm_emulate_insn: 0:fd399:0f aa (prot32) .... 27652.855257: kvm_inj_exception: #UD (0x0) d..1 27652.855258: kvm_entry: vcpu 4 .... 27652.855259: kvm_exit: reason shutdown rip 0xfd399 info 0 0 .... 27652.855259: kvm_nested_vmexit: rip: 0x00000000000fd399 reason: shutdown ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x80000b08 ext_int_err: 0x00000000 .... 27652.855259: kvm_nested_vmexit_inject: reason: shutdown ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x80000b08 ext_int_err: 0x00000000 d..1 27652.855260: kvm_entry: vcpu 4 L0 has no idea that L2 is in SMM (when L1 boots I do see SMM entries/exits and correctly emulated rsm intercepts). But without the bad commit I get this: .... 12724.894359: kvm_exit: reason UD excp rip 0xfd399 info 0 0 .... 12724.894359: kvm_nested_vmexit: rip: 0x00000000000fd399 reason: UD excp ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 .... 12724.894359: kvm_nested_vmexit_inject: reason: UD excp ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 So it's still a #UD. But it seems the problem here is that when the RSM handler triggers a #UD because it thinks the guest isn't in SMM mode (which is fine if that were delivered to L1, since L1 knows how to handle it), it gets delivered straight to L2. Without the RSM intercept, the #UD triggers a nested vmexit, and things work. I tried following the exception injection path but I'm a bit lost. inject_pending_event calls into kvm_x86_ops->check_nested_events, which I think is supposed to turn some events into nested vmexits, but is only implemented for VMX, not SVM. Then kvm_x86_ops->queue_exception gets called with vcpu->arch.exception.injected = true. svm_queue_exception has a path to nested_svm_check_exception, but only when injected == false. Even if I get rid of that check, nested_svm_check_exception calls nested_svm_intercept which returns NESTED_EXIT_HOST, and that goes nowhere again. -- Hector Martin "marcan" (marcan@xxxxxxxxx) Public Key: https://mrcn.st/pub