On 08/05/19 at 06:55pm, Ard Biesheuvel wrote: > On Mon, 5 Aug 2019 at 11:36, Dave Young <dyoung@xxxxxxxxxx> wrote: > > > > kexec reboot fails randomly in UEFI based kvm guest. The firmware > > just reset while calling efi_delete_dummy_variable(); Unfortunately > > I don't know how to debug the firmware, it is also possible a potential > > problem on real hardware as well although nobody reproduced it. > > > > The intention of efi_delete_dummy_variable is to trigger garbage collection > > when entering virtual mode. But SetVirtualAddressMap can only run once > > for each physical reboot, thus kexec_enter_virtual_mode is not necessarily > > a good place to clean dummy object. > > > > I would argue that this means it is not a good place to *create* the > dummy variable, and if we don't create it, we don't have to delete it > either. > > > Drop efi_delete_dummy_variable so that kexec reboot can work. > > > > Creating it and not deleting it is bad, so please try and see if we > can omit the creation on this code path instead. > Check the code for the dummy var, it is created only in below chunk: arch/x86/platform/efi/quirks.c: efi_query_variable_store(): [snip] /* * We account for that by refusing the write if permitting it would * reduce the available space to under 5KB. This figure was provided by * Samsung, so should be safe. */ if ((remaining_size - size < EFI_MIN_RESERVE) && !efi_no_storage_paranoia) { /* * Triggering garbage collection may require that the firmware * generate a real EFI_OUT_OF_RESOURCES error. We can force * that by attempting to use more space than is available. */ unsigned long dummy_size = remaining_size + 1024; void *dummy = kzalloc(dummy_size, GFP_KERNEL); if (!dummy) return EFI_OUT_OF_RESOURCES; status = efi.set_variable((efi_char16_t *)efi_dummy_name, &EFI_DUMMY_GUID, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, dummy_size, dummy); if (status == EFI_SUCCESS) { /* * This should have failed, so if it didn't make sure * that we delete it... */ efi_delete_dummy_variable(); } [snip] So the dummy var only be created when the if condition matched, also once creating succeeded it is deleted. The deleting while entering virtual mode is always deleting a non exist efi var. Please correct me if I miss something. If above is true, then at least in the kexec path can be dropped because we have a real bug which resets machine. Thanks Dave