On Tue, 22 Sep 2020 at 11:11, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > On Mon, Sep 21, 2020 at 5:41 PM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > Update the p2v patching code so we can deal with displacements that are > > not a multiple of 16 MiB but of 2 MiB, to prevent wasting of up to 14 MiB > > of physical RAM when running on a platform where the start of memory is > > not correctly aligned. > > > > For the ARM code path, this simply comes down to using two add/sub > > instructions instead of one for the carryless version, and patching > > each of them with the correct immediate depending on the rotation > > field. For the LPAE calculation, it patches the MOVW instruction with > > up to 12 bits of offset. > > > > For the Thumb2 code path, patching more than 11 bits off displacement > > is somewhat cumbersome, and given that 11 bits produce a minimum > > alignment of 2 MiB, which is also the granularity for LPAE block > > mappings, it makes sense to stick to 2 MiB for the new p2v requirement. > > > > Suggested-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx> > > Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > > My understanding of what is going on is limited to the high > level of things, and being able to do this is just a great thing > so FWIW: > Acked-by: Linus Walleij <linus.walleij@xxxxxxxxxx> > > If you or Russell need more thorough review I can sit down > and try to understand at the bit granularity what is going on > but it requires a bunch of time. Just tell me if you need this. > Just to summarize the intent of this code: the ARM kernel's linear map starts at PAGE_OFFSET, which maps to a physical address (PHYS_OFFSET) that is platform specific, and is discovered at boot. Since we don't want to slow down translations between physical and virtual addresses by keeping the offset in a variable in memory, we implement this by patching the code performing the translation, and putting the offset between PAGE_OFFSET and the start of physical RAM directly into the instruction opcodes. Currently, we only patch up to 8 bits of offset, which gives us 4 GiB >> 8 == 16 MiB of granularity, and so if the start of physical RAM is not a multiple of 16 MiB, we have to round it up to the next multiple. This wastes some physical RAM, since the memory you skipped will now live below PAGE_OFFSET, making it inaccessible to the kernel. By changing the patchable sequences and the patching logic to carry more bits of offset, we can improve this: 11 bits gives us 4 GiB >> 11 == 2 MiB granularity, and so you never waste more than that amount by rounding up the physical start of DRAM to the next multiple of 2 MiB. (Note that 2 MiB granularity guarantees that the linear mapping can be created efficiently, whereas less than 2 MiB may result in the linear mapping needing another level of page tables) This helps Zhen Lei's scenario, where the start of DRAM is known to be occupied. It also helps EFI boot, which relies on the firmware's page allocator to allocate space for the decompressed kernel as low as possible. And if the KASLR patches ever land for 32-bit, it will give us 3 more bits of randomization of the placement of the kernel inside the linear region.