Hi Johan, Putting Leif on cc, although he is OoO and so it may take him a while to respond. On Thu, 28 Nov 2024 at 09:20, Johan Hovold <johan@xxxxxxxxxx> wrote: > > Hi Ard, > > We've run into a buggy UEFI implementation on the Qualcomm Snapdragon > based Lenovo ThinkPad T14s where ExitBootServices() often fails. > > One bootloader entry may fail to start almost consistently (once in a > while it may start), while a second entry may always work even when the > kernel, dtb and initramfs images are copies of the failing entry on the > same ESP. > > This can be worked around by starting and exiting a UEFI shell from the > bootloader or by starting the bootloader manually via the Boot Menu > (F12) before starting the kernel. > > Notably starting the kernel automatically from the shell startup.nsh > does not work, while calling the same script manually works. > > Based on your comments to a similar report for an older Snapdragon based > Lenovo UEFI implementation [1], I discovered that allocating an event > before calling ExitBootServices() can make the call succeed. There is > often no need to actually signal the event group, but the event must > remain allocated (i.e. CloseEvent() must not be called). > > (Raising TPL or disabling interrupts does not seem to help.) > > Also with the event signalling, ExitBootServices() sometimes fails when > starting the kernel automatically from a shell startup.nsh, while > systemd-boot seems to always work. This was only observed after removing > some efi_printk() used during the experiments from the stub... > > Something is obviously really broken here, but do you have any > suggestions about what could be the cause of this as further input to > Qualcomm (and Lenovo) as they try to fix this? > > For completeness, the first call to ExitBootServices() often fails also > on the x1e80100 reference design (CRD), and Qualcomm appears to have > been the ones providing the current retry implementation: > > fc07716ba803 ("efi/libstub: Introduce ExitBootServices helper") > > as this was needed to prevent similar boot failures with older Qualcomm > UEFI fw. > > Marc is also hitting something like this on the Qualcomm X1E devkit > (i.e. with firmware that should not have any modifications from Lenovo). > So the error code is EFI_INVALID_PARAMETER in all cases? In the upstream implementation, the only thing that can make ExitBootServices() return an error is a mismatch of the map key, and so there is something changing the memory map. This might be due to a handler of the gEfiEventBeforeExitBootServicesGuid event group that fails to close the event, and so it gets signaled every time. This is a fairly recent addition, though, so I'm not sure it even exists in QCOM's tree. In upstream EDK2, the map key is just a monotonic counter that gets incremented on every memory map update, so one experiment worth conducting is to repeat the second call to ExitBootServices() a couple of times, increasing the map key each time. Or use GetMemoryMap() to just grab the map key without the actual memory map, and printing it to the console (although the timer is disabled on the first call so anything that relies on that will be shut down at this point)