On 14/10/2024 18:32, Florian Fainelli wrote: > On 10/14/24 03:55, Ryan Roberts wrote: >> Hi All, >> >> Patch bomb incoming... This covers many subsystems, so I've included a core set >> of people on the full series and additionally included maintainers on relevant >> patches. I haven't included those maintainers on this cover letter since the >> numbers were far too big for it to work. But I've included a link to this cover >> letter on each patch, so they can hopefully find their way here. For follow up >> submissions I'll break it up by subsystem, but for now thought it was important >> to show the full picture. >> >> This RFC series implements support for boot-time page size selection within the >> arm64 kernel. arm64 supports 3 base page sizes (4K, 16K, 64K), but to date, page >> size has been selected at compile-time, meaning the size is baked into a given >> kernel image. As use of larger-than-4K page sizes become more prevalent this >> starts to present a problem for distributions. Boot-time page size selection >> enables the creation of a single kernel image, which can be told which page size >> to use on the kernel command line. >> >> Why is having an image-per-page size problematic? >> ================================================= >> >> Many traditional distros are now supporting both 4K and 64K. And this means >> managing 2 kernel packages, along with drivers for each. For some, it means >> multiple installer flavours and multiple ISOs. All of this adds up to a >> less-than-ideal level of complexity. Additionally, Android now supports 4K and >> 16K kernels. I'm told having to explicitly manage their KABI for each kernel is >> painful, and the extra flash space required for both kernel images and the >> duplicated modules has been problematic. Boot-time page size selection solves >> all of this. >> >> Additionally, in starting to think about the longer term deployment story for >> D128 page tables, which Arm architecture now supports, a lot of the same >> problems need to be solved, so this work sets us up nicely for that. >> >> So what's the down side? >> ======================== >> >> Well nothing's free; Various static allocations in the kernel image must be >> sized for the worst case (largest supported page size), so image size is in line >> with size of 64K compile-time image. So if you're interested in 4K or 16K, there >> is a slight increase to the image size. But I expect that problem goes away if >> you're compressing the image - its just some extra zeros. At boot-time, I expect >> we could free the unused static storage once we know the page size - although >> that would be a follow up enhancement. >> >> And then there is performance. Since PAGE_SIZE and friends are no longer >> compile-time constants, we must look up their values and do arithmetic at >> runtime instead of compile-time. My early perf testing suggests this is >> inperceptible for real-world workloads, and only has small impact on >> microbenchmarks - more on this below. >> >> Approach >> ======== >> >> The basic idea is to rid the source of any assumptions that PAGE_SIZE and >> friends are compile-time constant, but in a way that allows the compiler to >> perform the same optimizations as was previously being done if they do turn out >> to be compile-time constant. Where constants are required, we use limits; >> PAGE_SIZE_MIN and PAGE_SIZE_MAX. See commit log in patch 1 for full description >> of all the classes of problems to solve. >> >> By default PAGE_SIZE_MIN=PAGE_SIZE_MAX=PAGE_SIZE. But an arch may opt-in to >> boot-time page size selection by defining PAGE_SIZE_MIN & PAGE_SIZE_MAX. arm64 >> does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig, >> which is an alternative to selecting a compile-time page size. >> >> When boot-time page size is active, the arch pgtable geometry macro definitions >> resolve to something that can be configured at boot. The arm64 implementation in >> this series mainly uses global, __ro_after_init variables. I've tried using >> alternatives patching, but that performs worse than loading from memory; I think >> due to code size bloat. > > FWIW, this paragraph was not entirely clear to me until I looked at patch 57 to > see that the compile time page size selection had been retained, and could > continue to be used as-is. It was somewhat implicit, but not IMHO explicit > enough, not a big deal though. I intended to make that bit clear with the above sentance "arm64 does this if the user selects the CONFIG_ARM64_BOOT_TIME_PAGE_SIZE Kconfig, which is an alternative to selecting a compile-time page size.", but appreciate there is a lot going on here. > > Great work, thanks for doing that! This makes me wonder if we could leverage any > of that to have a single kernel supporting both LPAE and !LPAE on ARM 32-bit, > but that still seems like somewhat more difficult, largely due to the difference > in the page table descriptor format (long vs. short). We will eventually have the exact same problem with FEAT_D128 on arm64. This introduces page tables with 128 bit PTEs. Ideally we would like to support both in a single image, although, we have much more thinking to do on that. But my current view is that this series solves a bunch of problems that makes it easier (PTRS_PER_Pxx and Pxx_SHIFT all become boot-time values, for example, so we can easily represent the different geometries). Yes, we still need to solve the PTE size difference (in our case 64-bit vs 128-bit). I have a couple of proposals for how to do that; the "gold-plated" approach would be to create and use a handle type to represent a PTE/PxD slot in a table. Then increments/decrements would be enforced via explicit helpers that know the size, and direct dereferencing would be impossible. When accessing via helpers we would pass around pte_t/pxd_t values that are the larger size, then narrow then when writing back. Anshuman has a series [1] that starts to move in that direction. If you have any other ideas, it would be good to talk! [1] https://lore.kernel.org/linux-mm/20240917073117.1531207-1-anshuman.khandual@xxxxxxx/ Thanks, Ryan