On Tue, Jan 15, 2019 at 7:02 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 1/10/19 9:12 PM, Pingfan Liu wrote: > > Background > > When kaslr kernel can be guaranteed to sit inside unmovable node > > after [1]. > > What does this "[1]" refer to? > https://lore.kernel.org/patchwork/patch/1029376/ > Also, can you clarify your terminology here a bit. By "kaslr kernel", > do you mean the base address? > It should be the randomization of load address. Googled, and found out that it is "base address". > > But if kaslr kernel is located near the end of the movable node, > > then bottom-up allocator may create pagetable which crosses the boundary > > between unmovable node and movable node. > > Again, I'm confused. Do you literally mean a single page table page? I > think you mean the page tables, but it would be nice to clarify this, > and also explicitly state which page tables these are. > It should be page table pages. The page table is built by init_mem_mapping(). > > It is a probability issue, > > two factors include -1. how big the gap between kernel end and > > unmovable node's end. -2. how many memory does the system own. > > Alternative way to fix this issue is by increasing the gap by > > boot/compressed/kaslr*. > > Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]? > Sorry, and yes, code in arch/x86/boot/compressed/kaslr_64.c and kaslr.c > It took me a minute to figure out you were talking about filenames. > > > But taking the scenario of PB level memory, the pagetable will take > > server MB even if using 1GB page, different page attr and fragment > > will make things worse. So it is hard to decide how much should the > > gap increase. > I'm not following this. If we move the image around, we leave holes. > Why do we need page table pages allocated to cover these holes? > I means in arch/x86/boot/compressed/kaslr.c, store_slot_info() { slot_area.num = (region->size - image_size) /CONFIG_PHYSICAL_ALIGN + 1 }. Let us denote the size of page table as "X", then the formula is changed to slot_area.num = (region->size - image_size -X) /CONFIG_PHYSICAL_ALIGN + 1. And it is hard to decide X due to the above factors. > > The following figure show the defection of current bottom-up style: > > [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC] > > "defection"? > Oh, defect. > > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping() > > can generate pgtable on nodeC, which stain movable node. > > Let me see if I can summarize this: > 1. The kernel ASLR decompression code picks a spot to place the kernel > image in physical memory. > 2. Some page tables are dynamically allocated near (after) this spot. > 3. Sometimes, based on the random ASLR location, these page tables fall > over into the "movable node" area. Being unmovable allocations, this > is not cool. > 4. To fix this (on 64-bit at least), we stop allocating page tables > based on the location of the kernel image. Instead, we allocate > using the memblock allocator itself, which knows how to avoid the > movable node. > Yes, you get my idea exactly. Thanks for your help to summary it. Hard for me to express it clearly in English. > > This patch makes it certainty instead of a probablity problem. It achieves > > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping(). > > What does memory hotplug have to do with this? I thought this was all > about early boot. Put the info about memory hot plugable to memblock allocator, initmem_init()->...->acpi_numa_memory_affinity_init(), where memblock_mark_hotplug() does it. Later when memory allocator works, in __next_mem_range(), it will check this info by memblock_is_hotpluggable(). Thanks and regards, Pingfan