On 14 October 2018 at 18:51, Eric Snowberg <eric.snowberg@xxxxxxxxxx> wrote: > >> On Oct 14, 2018, at 9:34 AM, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote: >> >> On 10/12/2018 3:46 PM, Ard Biesheuvel wrote: >>> On 9 August 2018 at 16:50, Eric Snowberg <eric.snowberg@xxxxxxxxxx> wrote: >>>> Commit d64934019f6c ("x86/efi: Use efi_exit_boot_services()") >>>> introduced a regression on systems with large memory maps >>>> causing them to hang on boot. The first "goto get_map" that was removed >>>> from exit_boot insured there was enough room for the memory map when >>>> efi_call_early(exit_boot_services) was called. This happens when >>>> (nr_desc > ARRAY_SIZE(params->e820_table). >>>> >>>> Chain of events: >>>> exit_boot() >>>> efi_exit_boot_services() >>>> efi_get_memory_map <- at this point the mm can't grow over 8 desc >>>> priv_func() >>>> exit_boot_func() >>>> allocate_e820ext() <- new mm grows over 8 desc from e820 alloc >>>> efi_call_early(exit_boot_services) <- mm key doesn't match so retry >>>> efi_call_early(get_memory_map) <- not enough room for new mm >>>> system hangs >>>> >>>> This patch allocates the e820 buffer before calling efi_exit_boot_services >>>> and fixes the regression. >>>> >>>> Signed-off-by: Eric Snowberg <eric.snowberg@xxxxxxxxxx> >>>> --- >>>> arch/x86/boot/compressed/eboot.c | 63 +++++++++++++++++++++++++--------------- >>>> 1 file changed, 40 insertions(+), 23 deletions(-) >>>> >>>> diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c >>>> index cbf4b87..91a650e 100644 >>>> --- a/arch/x86/boot/compressed/eboot.c >>>> +++ b/arch/x86/boot/compressed/eboot.c >>>> @@ -866,11 +866,43 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, >>>> return status; >>>> } >>>> >>>> +static efi_status_t allocate_e820(struct boot_params *params, >>>> + struct setup_data **e820ext, >>>> + u32 *e820ext_size) >>>> +{ >>>> + unsigned long map_size, desc_size, buff_size; >>>> + struct efi_boot_memmap boot_map; >>>> + efi_memory_desc_t *map; >>>> + efi_status_t status; >>>> + __u32 nr_desc; >>>> + >>>> + boot_map.map = ↦ >>>> + boot_map.map_size = &map_size; >>>> + boot_map.desc_size = &desc_size; >>>> + boot_map.desc_ver = NULL; >>>> + boot_map.key_ptr = NULL; >>>> + boot_map.buff_size = &buff_size; >>>> + >>>> + status = efi_get_memory_map(sys_table, &boot_map); >>>> + if (status != EFI_SUCCESS) >>>> + return status; >>>> + >>>> + nr_desc = buff_size / desc_size; >>>> + >>> I think we should add some slack here. This function is now called >>> right before the first call to ExitBootServices(), which means events >>> may fire that allocate or free memory between this call to >>> efi_get_memory_map() and the final one which obtains the map key >>> (which is the whole reason for this complicated dance) >> >> Slack in addition to what efi_get_memory_map() provides? > > Would additional slack be necessary here? This takes place before memory allocation is no longer possible. The efi_get_memory_map function will be called again afterwards insuring their is slack available for the memory map passed into EBS. > The point is that memory allocation is always possible due to asynchronous events firing. Only after the first call to EBS() is it guaranteed that event dispatch has terminated. But to Jeffrey's point, efi_get_memory_map() already has 8 slots' worth of slack, which means that we have 8 spare e820 table entries as well. I guess this should be sufficient.