On Sat, 2024-07-06 at 11:20 +0000, bugzilla-daemon@xxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=219009 > > Bug ID: 219009 > Summary: Random host reboots on Ryzen 7000/8000 using nested > VMs (vls suspected) > Product: Virtualization > Version: unspecified > Hardware: AMD > OS: Linux > Status: NEW > Severity: high > Priority: P3 > Component: kvm > Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx > Reporter: zaltys@xxxxxxxxx > Regression: No > > Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's > reboots. > > There is no kernel panic, no log entries, no relevant output to serial console. > It is as if platform is simply hard reset. It seems time to reproduce it varies > from system to system and can be dependent on workload and even specific CPU > model. > > I can reproduce it with kernel 6.9.7 and qemu 9.0 on Ryzen 7950X3D under one > hour by using KVM -> Windows 10/11 with Hyper-V services on or KVM -> Windows > 10/11 with 3 VBox VMs (also Win11) running. Others people had it repeatedly > reproduced on Ryzen 7700,7600 and 8700GE, including KVM -> KVM -> Linux.[1] I > also have seen Hetzner (company offering Ryzen based dedicated servers) > customers complaining about similiar random reboots. > > I tried looking up errata for Ryzen 7000/8000, but could not find one > published, so I decided to check errata for EPYC 9004 [2], which is also Zen4 > arch as Ryzen 7000/8000. It has nesting related bug #1495 (on page 49), which > mentions using Virtualized VMLOAD/VMSAVE can result in MCE and/or system reset. > > Based on that errata mentioned above, I reconfigured my system with > kvm_amd.vls=0 and for me random reboots with nested virtualization stopped. > Same was reported by several people from [1]. > > Somebody from AMD must be asked to confirm if it is really Ryzen 7000/8000 > hardware bug, and if there is a better fix than disabling VLS as it has > performance hit. If disabling it is the only fix, then kvm_amd.vls=0 must be > default for Ryzen 7000/8000. > > [1] > https://www.reddit.com/r/Proxmox/comments/1cym3pl/nested_virtualization_crashing_ryzen_7000_series/ > [2] > https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf > Hi! Can someone from AMD take a look at this bug: