Hi Alex, On Wed, Jan 17, 2024 at 02:46:56PM +0000, Alexander Graf wrote: > We now have all bits in place to support KHO kexecs. This patch adds > awareness of KHO in the kexec file as well as boot path for x86 and > adds the respective kconfig option to the architecture so that it can > use KHO successfully. > > In addition, it enlightens it decompression code with KHO so that its > KASLR location finder only considers memory regions that are not already > occupied by KHO memory. > > Signed-off-by: Alexander Graf <graf@xxxxxxxxxx> > > --- > > v1 -> v2: > > - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO > - s/kho_reserve_mem/kho_reserve_previous_mem/g > - s/kho_reserve/kho_reserve_scratch/g > --- > arch/x86/Kconfig | 3 ++ > arch/x86/boot/compressed/kaslr.c | 55 +++++++++++++++++++++++++++ > arch/x86/include/uapi/asm/bootparam.h | 15 +++++++- > arch/x86/kernel/e820.c | 9 +++++ > arch/x86/kernel/kexec-bzimage64.c | 39 +++++++++++++++++++ > arch/x86/kernel/setup.c | 46 ++++++++++++++++++++++ > arch/x86/mm/init_32.c | 7 ++++ > arch/x86/mm/init_64.c | 7 ++++ > 8 files changed, 180 insertions(+), 1 deletion(-) ... > @@ -987,8 +1013,26 @@ void __init setup_arch(char **cmdline_p) > cleanup_highmap(); > > memblock_set_current_limit(ISA_END_ADDRESS); > + > e820__memblock_setup(); > > + /* > + * We can resize memblocks at this point, let's dump all KHO > + * reservations in and switch from scratch-only to normal allocations > + */ > + kho_reserve_previous_mem(); > + > + /* Allocations now skip scratch mem, return low 1M to the pool */ > + if (is_kho_boot()) { > + u64 i; > + phys_addr_t base, end; > + > + __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, > + MEMBLOCK_SCRATCH, &base, &end, NULL) > + if (end <= ISA_END_ADDRESS) > + memblock_clear_scratch(base, end - base); > + } You had to mark lower 16M as MEMBLOCK_SCRATCH because at this point the mapping of the physical memory is not ready yet and page tables only cover lower 16M and the memory mapped in kexec::init_pgtable(). Hence the call for memblock_set_current_limit(ISA_END_ADDRESS) slightly above, which essentially makes scratch mem reserved by KHO unusable for allocations. I'd suggest to move kho_reserve_previous_mem() earlier, probably even right next to kho_populate(). kho_populate() already does memblock_add(scratch) and at that point it's the only physical memory that memblock knows of, so if it'll have to allocate, the allocations will end up there. Also, there are no kernel allocations before e820__memblock_setup(), so the only memory that might need to be allocated is for memblock_double_array() and that will be discarded later anyway. With this, it seems that MEMBLOCK_SCRATCH is not needed, as the scratch memory is anyway the only usable memory up to e820__memblock_setup(). > /* > * Needs to run after memblock setup because it needs the physical > * memory size. > @@ -1104,6 +1148,8 @@ void __init setup_arch(char **cmdline_p) > */ > arch_reserve_crashkernel(); > > + kho_reserve_scratch(); > + > memblock_find_dma_reserve(); > > if (!early_xdbc_setup_hardware()) > diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c > index b63403d7179d..6c3810afed04 100644 > --- a/arch/x86/mm/init_32.c > +++ b/arch/x86/mm/init_32.c > @@ -20,6 +20,7 @@ > #include <linux/smp.h> > #include <linux/init.h> > #include <linux/highmem.h> > +#include <linux/kexec.h> > #include <linux/pagemap.h> > #include <linux/pci.h> > #include <linux/pfn.h> > @@ -738,6 +739,12 @@ void __init mem_init(void) > after_bootmem = 1; > x86_init.hyper.init_after_bootmem(); > > + /* > + * Now that all KHO pages are marked as reserved, let's flip them back > + * to normal pages with accurate refcount. > + */ > + kho_populate_refcount(); This should go to mm_core_init(), there's nothing architecture specific there. > + > /* > * Check boundaries twice: Some fundamental inconsistencies can > * be detected at build time already. -- Sincerely yours, Mike.