Re: [PATCH] efi: arm-stub: Correct FDT and initrd allocation rules for arm64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/9/2017 11:26 AM, Ard Biesheuvel wrote:
On 9 February 2017 at 18:18, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
On 9 February 2017 at 18:01, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:
On 2/9/2017 10:45 AM, Ard Biesheuvel wrote:

On 9 February 2017 at 17:41, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:

On 2/9/2017 10:16 AM, Ard Biesheuvel wrote:


On 9 February 2017 at 17:06, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:


On 2/9/2017 3:16 AM, Ard Biesheuvel wrote:



On arm64, we have made some changes over the past year to the way the
kernel itself is allocated and to how it deals with the initrd and
FDT.
This patch brings the allocation logic in the EFI stub in line with
that,
which is necessary because the introduction of KASLR has created the
possibility for the initrd to be allocated in a place where the kernel
may not be able to map it. (This is currently a theoretical scenario,
since it only affects systems where the size of RAM exceeds the size
of
the linear mapping.)

So adhere to the arm64 boot protocol, and make sure that the initrd is
fully inside a 1GB aligned 32 GB window that covers the kernel as
well.

The FDT may be anywhere in memory on arm64 now that we map it via the
fixmap, so we can lift the address restriction there completely.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
---




I'll give this a test on our platform that was running into the current
limitation - probably this weekend.

I reviewed the code and its ok, but I do have one question.  Do we need
to
handle the case where initrd ends up below the kernel?

Lets assume KALSR puts the kernel somewhere up high in DDR, after the
32GB
mark in DDR.  Now lets assume the unlikely scenario that the initrd
won't
fit anywhere after 32GB, but will fit before 32GB.  Per my
understanding
of
efi_high_alloc, it will put the initrd before the 32GB mark, which will
be
outside of the window where the kernel is.


The 32 GB does not have to be 32 GB aligned, only 1 GB aligned. So as
long as the follow expression holds, we should be fine


align(max(kernel_end, initrd_end), 1g) - align_down (min
(kernel_start, initrd_start), 1g) <= 32g



Yes, and I argue there is a possibility (we'll call it extremely remote)
where that may not hold.  My question is, do we care about that
possibility,
and if so, do we do anything about it?


We allocate top down, so we start at align_down(base_of_image, 1g) +
32g, and go down until we hit a region that first our initrd. We will
disregard the region that the kernel occupies, but below that, we will
just proceed until we find a slot. This effectively means we have a 63
GB window, with the kernel in the middle, where we can load the initrd
and adhere to the boot protocol. I don't see how we could end up in
the situation where we load the kernel somewhere, and both the 31 GB
before *and* after are completely occupied.


No we don't.  We do not allocate top down.  Please look at efi_high_alloc.

Efi_high_alloc iterates though the memory map, low to high.

BTW the memory map isn't necessarily sorted per the UEFI spec, so it
iterates in what is essentially random order, not low to high.

True, I'm used to EDK2, which from what I've seen, keeps it ordered. However that's somewhat immaterial to my point that its possible for initrd to be far enough from kernel to break booting.txt


It looks to see
if a slot can hold the allocation, and the slot does not exceed the
specified max.  If so, efi_high_alloc retains a reference to the slot. Then
efi_high_alloc continues iterating though the map, until the end.
efi_high_alloc only stores a reference to the most recently valid slot,
which would be the highest slot in the map.


It is documented as

/*
 * Allocate at the highest possible address that is not above 'max'.
 */

and what you describe is pretty much that, no?

My system can have 256GB (or more) of RAM.  It is possible, however remote,
that the initrd and kernel can be more than 64GB away from each other.

Lets assume KASLR puts the kernel at 250GB.  Lets assume, for whatever
reason, we can't fit the initrd above 150GB (there was just enough room to
jam kernel there somwhow, but firmware is consuming the rest, maybe it put
rootfs there via NFIT).

So before even booting the kernel, you already have 100 GB of memory
occupied?

That is possible, yes. Likely? Probably not. Would our system fail if initrd and kernel are father than the prescribed restriction? No, since the system can address all of RAM, we'd probably be fine.

As I replied before, you are correct that in this case, you
will not be able to put the initrd within 32 GB of the kernel. But do
note that this 32 GB figure is derived from the linear region size of
a 16k pages kernel with 2 levels of translation, which is a niche
configuration by itself. On a system that has 256 GB of RAM, it is
highly unlikely that you will be using a kernel that can only map 32
GB of it.

The reason for choosing the 32 GB figure is that it relieves the boot
loader from having to go and figure out what kind of kernel is going
to be executed. Page size can be read from the Image header but the VA
size cannot. So 32 GB was a reasonable number imo.

Ok, so the restriction is completely arbitrary and has no real purpose. Ie nothing in the kernel will break, so long as you assume the system is not configured with more RAM than can be addressed, which doesn't feel reasonable to do.

I realize I'm being nitpicky, from my perspective, any issues related to efistub are particularly difficult to debug, so if this scenario we've been going around about ever popped up, it wouldn't even give you a print that happened when you back trace the output trying to figure out why the boot failed.

However, it really looks like even if the scenario occurred, there is zero realistic expectation anything would break, and its just a violation of some document that makes assumptions and should be treated more as guidance to try to follow, rather than hard rules.

I guess I'm satisfied, and don't see any need to continue the discussion. Thanks for entertaining me.


efi_high_alloc will put the initrd at some point
just below 150GB, because it iterates low to high,

No, because everything above that is occupied. If efi_high_alloc()
does not do what it says on the tin, we should fix that.

I will agree, efi_high_alloc() does what it says on the tin (my interpretation of what you were saying what not what you intended, sorry about that), but relying on that is not sufficient to implicitly assume that we are holding to the restrictions in booting.txt in all scenarios.


and 150GB will be below
the max of 250GB where the kernel is.  This will result in the initrd and
kernel being ~100GB away in this example, which violates the requirements
stated in Booting.txt

I see the situation is possible, but I admit it is remote.  If you want to
ignore it, fine.  I would be happy with that so long as the assumption is
documented so that if it is ever somehow violated in the real world, we know
what broke.

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux