https://bugzilla.kernel.org/show_bug.cgi?id=216867 --- Comment #2 from Eric Li (ercli@xxxxxxxxxxx) --- 在 2023-01-03星期二的 22:05 +0000,bugzilla-daemon@xxxxxxxxxx写道: > https://bugzilla.kernel.org/show_bug.cgi?id=216867 > > --- Comment #1 from Sean Christopherson (seanjc@xxxxxxxxxx) --- > On Fri, Dec 30, 2022, bugzilla-daemon@xxxxxxxxxx wrote: > > > My code performs the following experiment repeatedly on 3 CPUs: > > > > * Initially, "ptr" at address 0xb8000 (VGA memory mapped I/O) is > > set to 0 > > * CPU 0 writes 0x12345678 to ptr, then increases counter "count0". > > * In an infinite loop, CPU 1 tries exchanges ptr with register EAX > > (contains > > 0) > > using the XCHG instruction. If CPU 1 sees 0x12345678, it increases > > counter > > "count1". > > * CPU 2's behavior is similar to CPU 1, except it increases counter > > "count2" > > when it sees 0x12345678. > > > > Ideally, after each experiment there should always be count1 + > > count2 = > > count0. > > However, in KVM, there may be count1 + count2 > count0. This > > because CPU 0 > > writes 0x12345678 to ptr once, but CPU 1 and CPU 2 both get > > 0x12345678 in > > XCHG. > > Note that XCHG instruction always implements the locking protocol. > > > > There is also a deadlock after running the experiment a few times. > > However I > > am > > not trying to explain it for now. > > Is the suspect deadlock in userspace, the guest, or in the host > kernel? > The deadlock happens in the guest. It is due to how my experiment is implemented. It is not directly related to KVM. > > Guessed cause: > > > > I guess that KVM emulates the XCHG instruction that accesses > > 0xb8000. The > > call > > stack should be: > > > > ... > > x86_emulate_instruction (arch/x86/kvm/x86.c) > > x86_emulate_insn (arch/x86/kvm/emulate.c) > > writeback (arch/x86/kvm/emulate.c) > > segmented_cmpxchg (arch/x86/kvm/emulate.c) > > emulator_cmpxchg_emulated (arch/x86/kvm/x86.c, - > > >cmpxchg_emulated) > > emulator_try_cmpxchg_user (arch/x86/kvm/x86.c) > > ... > > CMPXCHG instruction > > > > Suppose CPU 2 wants to write 0 to ptr using writeback(), and > > expecting ptr to > > already contain 0x13245678. However, CPU 1 changes the content of > > ptr to 0. > > So > > * The CMPXCHG instruction fails (clears ZF). > > * emulator_try_cmpxchg_user returns 1. > > * emulator_cmpxchg_emulated() returns X86EMUL_CMPXCHG_FAILED. > > * segmented_cmpxchg() returns X86EMUL_CMPXCHG_FAILED. > > * writeback() returns X86EMUL_CMPXCHG_FAILED. > > * x86_emulate_insn() returns EMULATION_OK. > > > > Thus, I think the root cause of this bug is that x86_emulate_insn() > > ignores > > the > > X86EMUL_CMPXCHG_FAILED error. The correct behavior should be > > retrying the > > emulation using the updated value (similar to load-linked/store- > > conditional). > > KVM does retry the emulation, albeit in a very roundabout and non- > robust way. > On X86EMUL_CMPXCHG_FAILED, x86_emulate_insn() skips the EIP update > and doesn't > writeback GPRs. x86_emulate_instruction() is flawed and emulates > single-step, > but > the "eip" written should be the original RIP, i.e. shouldn't advance > past the > instructions being emulated. The single-step mess should be fixed, > but I doubt > that's the root cause here. > I see, thanks for the explanation. Now the retrying code looks correct to me (though I agree that the code could have been written in a better way). > Is there a memslot for 0xb8000? I assume not since KVM is emulating > (have you > actually verified that, e.g. with tracepoints?). KVM's ABI doesn't > support > atomic MMIO operations, i.e. if there's no memslot, KVM will > effectively drop > the LOCK semantics. If that's indeed what's happening, you should > see > > kvm: emulating exchange as write > > in the host dmesg (just once though). > You are right. I see "kvm: emulating exchange as write" when I run the guest I wrote. Looks like this is the check that causes KVM to drop LOCK on VGA MMIO: > gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); > > if (gpa == INVALID_GPA || > (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) > goto emul_write; Closing this bug since LOCK on MMIO is not supported by KVM's ABI. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.