[Bug 219009] New: Random host reboots on Ryzen 7000/8000 using nested VMs (vls suspected)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=219009

            Bug ID: 219009
           Summary: Random host reboots on Ryzen 7000/8000 using nested
                    VMs (vls suspected)
           Product: Virtualization
           Version: unspecified
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx
          Reporter: zaltys@xxxxxxxxx
        Regression: No

Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's
reboots.

There is no kernel panic, no log entries, no relevant output to serial console.
It is as if platform is simply hard reset. It seems time to reproduce it varies
from system to system and can be dependent on workload and even specific CPU
model.

I can reproduce it with kernel 6.9.7 and qemu 9.0 on Ryzen 7950X3D under one
hour by using KVM -> Windows 10/11 with Hyper-V services on or KVM -> Windows
10/11 with 3 VBox VMs (also Win11) running. Others people had it repeatedly
reproduced on Ryzen 7700,7600 and 8700GE, including KVM -> KVM -> Linux.[1] I
also have seen Hetzner (company offering Ryzen based dedicated servers)
customers complaining about similiar random reboots.

I tried looking up errata for Ryzen 7000/8000, but could not find one
published, so I decided to check errata for EPYC 9004 [2], which is also Zen4
arch as Ryzen 7000/8000. It has nesting related bug #1495 (on page 49), which
mentions using Virtualized VMLOAD/VMSAVE can result in MCE and/or system reset. 

Based on that errata mentioned above, I reconfigured my system with
kvm_amd.vls=0 and for me random reboots with nested virtualization stopped.
Same was reported by several people from [1].

Somebody from AMD must be asked to confirm if it is really Ryzen 7000/8000
hardware bug, and if there is a better fix than disabling VLS as it has
performance hit. If disabling it is the only fix, then kvm_amd.vls=0 must be
default for Ryzen 7000/8000.

[1]
https://www.reddit.com/r/Proxmox/comments/1cym3pl/nested_virtualization_crashing_ryzen_7000_series/
[2]
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux