I roughly understand the above content. The main reason for this phenomenon seems to be the chaotic VM memory layout caused by the syzkaller template settings. In fact, it’s even observable that the IDT region in the code doesn’t actually contain any exception handling code, very amusing :) Additionally, I would like to ask about the previously mentioned point where the IDT is set in the emulated MMIO space. How can I verify this, and where can I find the relevant code for setting the MMIO region? The guest loops because the the guest's IDT is located in emulated MMIO space, and as suspected above, KVM refuses to emulates HLT for L2. Also, I'm curious as to what technique is used to get the following type of logging information, and I'd like to be able to get each ENTRY and EXIT info on the run repro-1289 [019] d.... 140.314684: kvm_exit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000 repro-1289 [019] ..... 140.314685: kvm_nested_vmexit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000 repro-1289 [019] ..... 140.314688: kvm_inj_exception: #DB repro-1289 [019] d.... 140.314688: kvm_entry: vcpu 0, rip 0x1 repro-1289 [019] d.... 140.314704: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000 repro-1289 [019] ..... 140.314706: kvm_nested_vmexit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000 repro-1289 [019] ..... 140.314706: kvm_page_fault: vcpu 0 rip 0x1 address 0x0000000000001050 error_code 0x181 repro-1289 [019] ..... 140.314708: kvm_inj_exception: #DB [reinjected] repro-1289 [019] d.... 140.314709: kvm_entry: vcpu 0, rip 0x1 Sean Christopherson <seanjc@xxxxxxxxxx> 于2025年1月16日周四 23:24写道: > > +KVM and LKML to for archival, as this is not a DoS > > On Thu, Jan 16, 2025, chichen241 wrote: > > It seems that the attachment content is not convenient for you to see, so I > > will reuse the email content to describe it. > > ... > > > syz_kvm_setup_cpu(/*fd=*/vmfd, /*cpufd=*/vcpufd, /*usermem=*/mem, > > /*text=*/&nop_text, /*ntext*/ 1,/*flags=*/-1, /*opts=*/opts, /*nopt=*/1); // > > The nested vm will run '\x90\xf4', the vm will try to emulate the hlt > > instruction and fail, entry endless loop. ioctl(vcpufd, KVM_RUN, NULL); > > printf("The front kvm_run will caught in loop. This code will not be > > executed") } ``` > > linux kernel version: 6.12-rc7 > > Also I checked my mailbox and didn't see any quesiton from Sean. Maybe there's some mistake? > > For posterity: > > > > virtualization. When an L2 guest attempts to emulate an instruction > > How did you coerce KVM into emulating HLT from L2? > > > > using the x86_emulate_instruction() function, and the instruction to > > > be emulated is hlt, the x86_decode_emulated_instruction() function > > > used for instruction decoding does not support parsing the hlt > > > instruction. > > KVM should parse HLT just fine, I suspect the issue is that KVM _intentionally_ > refuses to emulate HLT from L2, because encountering HLT in the emulator when L2 > is active either requires the guest to be playing TLB games (e.g. generate an > emulated MMIO exit on a MOV, patch the MOV into a HLT), or it requires enabling > an off-by-default, "for testing purposes only" KVM module param. > > > > As a result, x86_decode_emulated_instruction() returns > > > ctxt->execute as null, causing the L2 guest to fail to execute the hlt > > > instruction properly. Subsequently, KVM enters an infinite loop, > > Define "infinite loop", i.e. what are the bounds of the loop? If the "loop" is > KVM re-entering the guest on the same instruction over and over, then everything > is working as intended. > > > > repeatedly invoking x86_emulate_instruction() to perform the same > > > operation. This issue does not occur when the instruction to be > > > emulated by L2 is another standard instruction. > > > > > > Therefore, I am wondering whether this constitutes a denial-of-service > > > (DoS) vulnerability and whether a CVE number can be assigned. > > Unless your reproducer causes a hard hang in KVM, or prevents L1 from gaining > control from L2, e.g. via a (virtual) interrupt, this is not a DoS. I can imagine > scenarios where L2 can put itself into an infinite loop, i.e. DoS itself, but > that's not a vulnerability in any reasonable sense of things. > > > > Generally, for software emulation in L1 guests, KVM's > > > x86_emulate_instruction() function will, after parsing the instruction > > > with x86_decode_emulated_instruction(), attempt to use > > > retry_instruction() to retry instruction execution. > > No, retry_instruction() is specifically for cases where KVM fails to emulate an > instruction _and_ the emulation was triggered by a write to guest PTE that KVM > is shadowing, i.e. a guest page that KVM has made read-only. If certain criteria > were met, KVM will unprotect the page, i.e. make it writable again, and resume > the guest to let the CPU retry the instruction. > > > ## DESCRIPTION in this file, the most code is from > > syzkaller(executor/common_kvm_amd64.h), I mainly call the `syz_kvm_setup_cpu` > > function and run the vm using ioctl `kvm_run`. First I use > > `syz_kvm_setup_cpu` to setup the vm to run a nested vm. The second time the > > `syz_kvm_setup_cpu` will turn on the TF bit in the eflag register of the > > nested vm and let the nested vm run `nop;hlt` code. > > When running kvm_run, the code will begin looping. > > ## ANALYSE > > The nested vm try to emulate the `hlt` code but failed, it will always try, caught in an endless loop. > > The guest loops because the the guest's IDT is located in emulated MMIO space, > and as suspected above, KVM refuses to emulates HLT for L2. > > The single-step #DB induced by RFLAGS.TF=1 triggers an EPT Violation as a result > of the CPU trying to vector the #DB with the IDT residing in non-existent memory. > At this point KVM *should* kick out to host userspace, as userspace is responsible > for dealing with the emulate MMIO access during exception vectoring. > > repro-1289 [019] d.... 140.314684: kvm_exit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000 > repro-1289 [019] ..... 140.314685: kvm_nested_vmexit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000 > repro-1289 [019] ..... 140.314688: kvm_inj_exception: #DB > repro-1289 [019] d.... 140.314688: kvm_entry: vcpu 0, rip 0x1 > repro-1289 [019] d.... 140.314704: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000 > repro-1289 [019] ..... 140.314706: kvm_nested_vmexit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000 > repro-1289 [019] ..... 140.314706: kvm_page_fault: vcpu 0 rip 0x1 address 0x0000000000001050 error_code 0x181 > repro-1289 [019] ..... 140.314708: kvm_inj_exception: #DB [reinjected] > repro-1289 [019] d.... 140.314709: kvm_entry: vcpu 0, rip 0x1 > > KVM misses the weird edge case, and instead ends up trying to emulate the > instruction at the current RIP. That instruction happens to be HLT, which KVM > doesn't support for L2 (nested guests), and so KVM injects #UD. > > repro-1289 [019] d.... 140.314732: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000 > repro-1289 [019] ..... 140.314749: kvm_emulate_insn: 0:1:f4 (prot32) > repro-1289 [019] ..... 140.314751: kvm_emulate_insn: 0:1:f4 (prot32) failed > repro-1289 [019] ..... 140.314752: kvm_inj_exception: #UD > > Vectoring the #UD suffers the same fate as the #DB, and so KVM unintentionally > puts the vCPU into an endless loop. > > repro-1289 [019] d.... 140.314767: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000306 intr_info 0x00000000 error_code 0x00000000 > repro-1289 [019] ..... 140.314767: kvm_nested_vmexit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000306 intr_info 0x00000000 error_code 0x00000000 > repro-1289 [019] ..... 140.314768: kvm_page_fault: vcpu 0 rip 0x1 address 0x0000000000000f78 error_code 0x1aa > repro-1289 [019] ..... 140.314778: kvm_emulate_insn: 0:1:f4 (prot32) > repro-1289 [019] ..... 140.314779: kvm_emulate_insn: 0:1:f4 (prot32) failed > > > ## QUESTION > > The phenomenon is due to the kvm's emulate function can't emulate all the > > instructions. > > No, the issue is that KVM doesn't detect a weird edge case where the *guest* has > messed up, and instead of effectively terminating the VM, KVM puts it into an > infinite loop of sorts. > > Amusingly, this edge case was just "fixed" for both VMX and SVM[*] (expected to > to land in v6.14). In quotes because "fixing" the problem really means killing > the VM instead of letting it loop. > > [1/7] KVM: x86: Add function for vectoring error generation > https://github.com/kvm-x86/linux/commit/11c98fa07a79 > [2/7] KVM: x86: Add emulation status for unhandleable vectoring > https://github.com/kvm-x86/linux/commit/5c9cfc486636 > [3/7] KVM: x86: Unprotect & retry before unhandleable vectoring check > https://github.com/kvm-x86/linux/commit/704fc6021b9e > [4/7] KVM: VMX: Handle vectoring error in check_emulate_instruction > https://github.com/kvm-x86/linux/commit/47ef3ef843c0 > [5/7] KVM: SVM: Handle vectoring error in check_emulate_instruction > https://github.com/kvm-x86/linux/commit/7bd7ff99110a > [6/7] selftests: KVM: extract lidt into helper function > https://github.com/kvm-x86/linux/commit/4e9427aeb957 > [7/7] selftests: KVM: Add test case for MMIO during vectoring > https://github.com/kvm-x86/linux/commit/62e41f6b4f36 > > [*] https://lore.kernel.org/all/173457555486.3295983.11848882309599168611.b4-ty@xxxxxxxxxx