On Tue, 2023-12-19 at 12:22 +0800, Baoquan He wrote: > Add Andrew to CC as Andrew helps to pick kexec/kdump patches. Ah, thanks, I didn't realise that Andrew pulls in the kexec patches. > > On 12/13/23 at 08:40am, James Gowans wrote: > ...... > > This has been tested by doing a kexec on x86_64 and aarch64. > > Hi James, > > Thanks for this great patch. My colleagues have opened bug in rhel to > track this and try to veryfy this patch. However, they can't reproduce > the issue this patch is fixing. Could you tell more about where and how > to reproduce so that we can be aware of it better? Thanks in advance. Sure! The TL;DR is: run a VMX (Intel x86) KVM VM on Linux v6.4+ and do a kexec while the KVM VM is still running. Before this patch the system will triple fault. In more detail: Run a bare metal host on a modern Intel CPU with VMX support. The kernel I was using was 6.7.0-rc5+. You can totally do this with a QEMU "host" as well, btw, that's how I did the debugging and attached GDB to it to figure out what was up. If you want a virtual "host" launch with: -cpu host -M q35,kernel-irqchip=split,accel=kvm -enable-kvm Launch a KVM guest VM, eg: qemu-system-x86_64 \ -enable-kvm \ -cdrom alpine-virt-3.19.0-x86_64.iso \ -nodefaults -nographic -M q35 \ -serial mon:stdio While the guest VM is *still running* do a kexec on the host, eg: kexec -l --reuse-cmdline --initrd=config-6.7.0-rc5+ vmlinuz-6.7.0-rc5+ && \ kexec -e The kexec can be to anything, but I generally just kexec to the same kernel/ramdisk as is currently running. Ie: same-version kexec. Before this patch the kexec will get stuck, after this the kexec will go smoothly and the system will end up in the new kernel in a few seconds. I hope those steps are clear and you can repro this? BTW, the reason that it's important for the KVM VM to still be running when the host does the kexec is because KVM internally maintains a usage counter and will disable virtualisation once all VMs have been terminated, via: __fput(kvm_fd) kvm_vm_release kvm_destroy_vm hardware_disable_all hardware_disable_all_nolock kvm_usage_count--; if (!kvm_usage_count) on_each_cpu(hardware_disable_nolock, NULL, 1); So if all KVM fds are closed then kexec will work because VMXE is cleared on all CPUs when the last VM is destroyed. If the KVM fds are still open (ie: QEMU process still exists) then the issue manifests. It sounds nasty to do a kexec while QEMU processes are still around but this is a perfectly normal flow for live update: 1. Pause and Serialise VM state 2. kexec 3. deserialise and resume VMs. In that flow there's no need to actually kill the QEMU process, as long as the VM is *paused* and has been serialised we can happily kexec. JG _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec