Re: Regression in nested SVM on 4.16 (bisected)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- marcan@xxxxxxxxx wrote:

> Hi,
> 
> I have a nested SVM setup with a 4.15 L0, 4.14.20 L1/L2 (L0 hardware
> is
> a Threadripper). After upgrading L0 to 4.16, L2 guests ceased to boot
> (no video output from BIOS, it seems to hang early in an endless
> vmexit
> loop). 4.17 still contains the regression.
> 
> I bisected it to:
> 
> commit 7607b7174405aec7441ff6c970833c463114040a
> Author: Brijesh Singh <brijesh.singh@xxxxxxx>
> Date:   Mon Feb 19 10:14:44 2018 -0600
> 
>     KVM: SVM: install RSM intercept
> 
> Perhaps the L2 BIOS uses SMM/rsm and this is interacting poorly with
> an
> L1 that does not contain this patch? I'm not familiar with the KVM
> code;
> any suggestions/pointers welcome.
> 

Before the patch you mentioned (7607b7174405 ("KVM: SVM: install RSM intercept")),
an RSM that executed did:
emulate_on_interception() -> x86_emulate_instruction(vcpu, 0, EMULTYPE_NO_REEXECUTE, NULL, 0);
and now RSM does:
x86_emulate_instruction() -> x86_emulate_instruction(vcpu, 0, 0, rsm_ins_bytes, 2);

Looking at x86_emulate_instruction(), the only difference of passing rsm_ins_bytes as parameter
is that now that function doesn't actually fetch RSM bytes from guest memory but just cause the x86
emulator to fetch it from passed array to reach em_rsm().
(Done because under AMD SEV, guest memory cannot be read by hypervisor).

If from some reason, emulation of RSM instruction fails (EMULATION_FAILED returned from x86_emulate_insn()),
reexecute_instruction() is executed and because EMULTYPE_NO_REEXECUTE wasn't passed, it may unprotect
some mmu pages and return true which will cause x86_emulate_instruction() to return EMULATE_DONE
which will cause instruction to retry.
In contrast, before this patch, EMULTYPE_NO_REEXECUTE is passed and therefore x86_emulate_instruction()
will always call handle_emulation_failure() in this case which would queue a #UD in case of
is_guest_mode(vcpu).

So I think the path to how to further bisect bug is very clear here:
1) First, attempt to change rsm_interception() to pass EMULTYPE_NO_REEXECUTE and see if it makes a difference.
(BTW, you can submit a commit that adds this EMULTYPE_NO_REEXECUTE as it should be present here)
2) If that doesn't work, attempt to remove rsm_ins_bytes and instead pass NULL.
If this works, this means that there are cases which raise RSM interception on bytes different than
"\x0f\xaa".

Anyway, having a look at:
# echo 1 >/sys/kernel/debug/tracing/events/kvm/enable
# cat /sys/kernel/debug/tracing/trace_pipe
Should help debug the issue in case you discover this patch wasn't the root-cause.

Hope it helps,
-Liran




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux