Re: [RFC PATCH] efi/libstub: Update memory map after calling priv_func

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Aug 1, 2018, at 11:07 AM, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
> 
> On 8/1/2018 10:27 AM, Eric Snowberg wrote:
>>> On Aug 1, 2018, at 9:52 AM, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
>>> 
>>> On 8/1/2018 8:56 AM, Eric Snowberg wrote:
>>>> Commit d2078d5adbe2 ("x86: EFI stub support for large memory maps")
>>>> added support for EFI memory maps larger than 128 entries.  Following
>>>> allocation for this new buffer (within the exit_boot function calling
>>>> alloc_e820ext) another call to efi_get_memory_map was performed.
>>>> This took place before the final call to ExitBootServices (EBS) to
>>>> insure the memory map buffer was large enough since memory allocation
>>>> is not allowed after calling EBS.
>>>> Later this code was refactored and portions were added to
>>>> efi-stub-helper.c. The call to efi_get_memory_map did not remain after
>>>> calling alloc_e820ext.  In a later patch, a new EFI_MMAP_NR_SLACK_SLOTS was
>>>> introduced to help give some headroom should the memory map grow after
>>>> the EBS is called. This does not alway provide enough space with large
>>>> memory maps.
>>> It might be nice if you CC'd the authors of the changes you are "fixing" so that they can chime in.  I guess I'm lucky I noticed this on the list.  You should probably also have a "fixes tag”.
>> Sorry I missed you.
>>> 
>>> Can you elaborate on the specific scenario where EFI_MMAP_NR_SLACK_CLOTS is insufficient? What is the exact failure encountered?
>> A hard hang.
> 
> Sorry, need more detail.  Hang, as in the system doesn't boot?

System doesn’t boot

>  100% of the time?  Randomly?

100% of the time.

> 
> Sounds like, based on what you've given so far, you are hitting the alloc path, which changes the memory map.  EBS fails.  We go into the retry path, but there isn't enough buffer space, so then the priv_func fails, resulting in a failed boot.  Since EBS has already run, you probably don't see any more prints.
> 
> Can you confirm my assumptions?

Exactly.  The memory is highly fragmented.

> 
> You haven't mentioned what system config generates this issue, nor what constitutes a "large memory map" where 8 allocation slots is insufficient.

The other option would be to increase the number from 8 to 128.  With the patch I submitted I was trying to not have every machine pay this penalty. But if this impacts ARM, would that be an acceptable solution instead?

> 
>>> 
>>>> This change will call efi_get_memory_map one final time before calling EBS,
>>>> ensuring there is enough room for the memory map buffer.  The additional
>>>> headroom will be left alone for any memory change that may currently
>>>> be in flight between the final efi_get_memory_map call and EBS.
>>> 
>>> Uhh, I'm thinking this should be Nack'd.
>>> 
>>> This invalidates the entire point of the priv_func because now the memory map may have changed after it was processed by priv_func, but the mem map may not have changed between your added call and EBS.  If that happens, EBS will pass, and we'll go into linux with a corrupt mem map.
>> How would we go into Linux with a corrupt memory map?
> 
> Have you looked at the other places where priv_func is declared?  Hint - https://elixir.bootlin.com/linux/latest/source/drivers/firmware/efi/libstub/fdt.c#L217
> 
> I expect your change will break the ARM64 path, which means it'll break my platform for my customers.  Something I'd like to avoid.
> 
> Lets try to find a solution that works for your problem, but also works for me.

I agree

> 
>>> The mem map can still change between your added call and EBS, so you haven't really done anything to address the existing race condition.
>> This patch is not trying to fix a race condition.
> 
> Your change is impacting a workaround to a race condition that is specified in the UEFI spec. Did you look at the list discussions circa 2016 which discuss the various facets of the code you are attempting to modify?  I'm not claiming the code is perfect, but so far, it seems like you don't completely understand the history and tradeoffs that were accepted, therefore it seems like you may be undoing that "fix".
> 
> Everything is a race condition in the part of code you are touching. The EFI stub isn't the only thing active at this point in time in the system state.  The FW itself (UEFI) is still active, and can be doing things up until the point EBS is called.  Therefore there are very few assumptions you can make.
> 
>> The priv_func is exit_boot_func on x86.  exit_boot_func can allocate much more memory than the approx 384 bytes set aside for slack space (thru alloc_e820ext).  
> 
> Please elaborate on the specific, real world scenario where this is possible.  With numbers.

When testing on a machine containing 16 PCIe cards, we would need 4K of slack space. Increasing the EFI_MMAP_NR_SLACK_SLOTS to 128 solves the problem too.



--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux