On Wed, Aug 28, 2019 at 10:19:51AM +0000, Jan Dakinevich wrote: > On Tue, 27 Aug 2019 07:50:30 -0700 > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > Yikes, this patch and the previous have quite the sordid history. > > > > > > The non-void return from inject_emulated_exception() was added by commit > > > > ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception") > > > > for the purpose of skipping writeback. At the time, the above blob in the > > decode flow didn't exist. > > > > > > Decode exception handling was added by commit > > > > 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn") > > > > but it was dead code even then. The patch discussion[1] even point out that > > it was dead code, i.e. the change probably should have been reverted. > > > > > > Peng Hao and Yi Wang later ran into what appears to be the same bug you're > > hitting[2][3], and even had patches temporarily queued[4][5], but the > > patches never made it to mainline as they broke kvm-unit-tests. Fun side > > note, Radim even pointed out[4] the bug fixed by patch 1/3. > > > > So, the patches look correct, but there's the open question of why the > > hypercall test was failing for Paolo. > > Sorry, I'm little confused. Could you please, point me which test or tests > were broken? I've just run kvm-unit-test and I see same results with and > without my changes. > > > I've tried to reproduce the #DF to > > no avail. Aha! The #DF occurs if patch 2/3, but not patch 3/3, is applied, and the VMware backdoor is enabled. The backdoor is off by default, which is why only Paolo was seeing the #DF. To handle the VMware backdoor, KVM intercepts #GP faults, which includes the non-canonical #GP from the hypercall unit test. With only patch 2/3 applied, x86_emulate_instruction() injects a #GP for the non-canonical RIP but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi() (or gp_interception() for SVM) to re-inject the original #GP because it thinks emulation failed due to a non-VMware opcode. Applying patch 3/3 resolves the issue as x86_emulate_instruction() returns EMULATE_DONE after injecting the #GP. TL;DR: Swap the order of patches and everything should be hunky dory. Please rebase to the latest kvm/queue, which has an equivalent to patch 1/3.