On 9 February 2017 at 19:04, Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote: > On 2/9/2017 11:26 AM, Ard Biesheuvel wrote: >> >> BTW the memory map isn't necessarily sorted per the UEFI spec, so it >> iterates in what is essentially random order, not low to high. > > > True, I'm used to EDK2, which from what I've seen, keeps it ordered. However > that's somewhat immaterial to my point that its possible for initrd to be > far enough from kernel to break booting.txt > >> >>>> It looks to see >>>> if a slot can hold the allocation, and the slot does not exceed the >>>> specified max. If so, efi_high_alloc retains a reference to the slot. >>>> Then >>>> efi_high_alloc continues iterating though the map, until the end. >>>> efi_high_alloc only stores a reference to the most recently valid slot, >>>> which would be the highest slot in the map. >>>> >>> >>> It is documented as >>> >>> /* >>> * Allocate at the highest possible address that is not above 'max'. >>> */ >>> >>> and what you describe is pretty much that, no? >>> >>>> My system can have 256GB (or more) of RAM. It is possible, however >>>> remote, >>>> that the initrd and kernel can be more than 64GB away from each other. >>>> >>>> Lets assume KASLR puts the kernel at 250GB. Lets assume, for whatever >>>> reason, we can't fit the initrd above 150GB (there was just enough room >>>> to >>>> jam kernel there somwhow, but firmware is consuming the rest, maybe it >>>> put >>>> rootfs there via NFIT). >>> >>> >>> So before even booting the kernel, you already have 100 GB of memory >>> occupied? > > > That is possible, yes. Likely? Probably not. Would our system fail if > initrd and kernel are father than the prescribed restriction? No, since the > system can address all of RAM, we'd probably be fine. > >>> As I replied before, you are correct that in this case, you >>> will not be able to put the initrd within 32 GB of the kernel. But do >>> note that this 32 GB figure is derived from the linear region size of >>> a 16k pages kernel with 2 levels of translation, which is a niche >>> configuration by itself. On a system that has 256 GB of RAM, it is >>> highly unlikely that you will be using a kernel that can only map 32 >>> GB of it. >>> >>> The reason for choosing the 32 GB figure is that it relieves the boot >>> loader from having to go and figure out what kind of kernel is going >>> to be executed. Page size can be read from the Image header but the VA >>> size cannot. So 32 GB was a reasonable number imo. > > > Ok, so the restriction is completely arbitrary and has no real purpose. Ie > nothing in the kernel will break, so long as you assume the system is not > configured with more RAM than can be addressed, which doesn't feel > reasonable to do. > > I realize I'm being nitpicky, from my perspective, any issues related to > efistub are particularly difficult to debug, so if this scenario we've been > going around about ever popped up, it wouldn't even give you a print that > happened when you back trace the output trying to figure out why the boot > failed. > > However, it really looks like even if the scenario occurred, there is zero > realistic expectation anything would break, and its just a violation of some > document that makes assumptions and should be treated more as guidance to > try to follow, rather than hard rules. > Actually, there is no reason for the stub to adhere 100% to the boot protocol, and we already violate it deliberately by randomizing the physical offset with a 64 KB granularity under KASLR, whereas the boot protocol stipulates that it should be a 2 MB aligned base + TEXT_OFFSET. So in this case, I think it is reasonable to take VA_BITS into account rather than use a hardcoded 32 GB. That will eliminate the problem completely when you use a 48-bit VA kernel, and will make it highly unlikely to ever occur on more common 39 and 42 bit VA configurations. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html