Re: [Bug 219009] New: Random host reboots on Ryzen 7000/8000 using nested VMs (vls suspected)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2024-07-06 at 11:20 +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219009
> 
>             Bug ID: 219009
>            Summary: Random host reboots on Ryzen 7000/8000 using nested
>                     VMs (vls suspected)
>            Product: Virtualization
>            Version: unspecified
>           Hardware: AMD
>                 OS: Linux
>             Status: NEW
>           Severity: high
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: zaltys@xxxxxxxxx
>         Regression: No
> 
> Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's
> reboots.
> 
> There is no kernel panic, no log entries, no relevant output to serial console.
> It is as if platform is simply hard reset. It seems time to reproduce it varies
> from system to system and can be dependent on workload and even specific CPU
> model.
> 
> I can reproduce it with kernel 6.9.7 and qemu 9.0 on Ryzen 7950X3D under one
> hour by using KVM -> Windows 10/11 with Hyper-V services on or KVM -> Windows
> 10/11 with 3 VBox VMs (also Win11) running. Others people had it repeatedly
> reproduced on Ryzen 7700,7600 and 8700GE, including KVM -> KVM -> Linux.[1] I
> also have seen Hetzner (company offering Ryzen based dedicated servers)
> customers complaining about similiar random reboots.
> 
> I tried looking up errata for Ryzen 7000/8000, but could not find one
> published, so I decided to check errata for EPYC 9004 [2], which is also Zen4
> arch as Ryzen 7000/8000. It has nesting related bug #1495 (on page 49), which
> mentions using Virtualized VMLOAD/VMSAVE can result in MCE and/or system reset. 
> 
> Based on that errata mentioned above, I reconfigured my system with
> kvm_amd.vls=0 and for me random reboots with nested virtualization stopped.
> Same was reported by several people from [1].
> 
> Somebody from AMD must be asked to confirm if it is really Ryzen 7000/8000
> hardware bug, and if there is a better fix than disabling VLS as it has
> performance hit. If disabling it is the only fix, then kvm_amd.vls=0 must be
> default for Ryzen 7000/8000.
> 
> [1]
> https://www.reddit.com/r/Proxmox/comments/1cym3pl/nested_virtualization_crashing_ryzen_7000_series/
> [2]
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf
> 

Hi!

Can someone from AMD take a look at this bug:


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux