On Sat, Oct 6, 2018 at 10:03 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > There's one PTI related layout asymmetry I noticed between 4-level and 5-level kernels: > > 47-bit: > > + | > > + | Kernel-space virtual memory, shared between all processes: > > +____________________________________________________________|___________________________________________________________ > > + | | | | > > + ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor > > + ffff880000000000 | -120 TB | ffffc7ffffffffff | 64 TB | direct mapping of all physical memory (page_offset_base) > > + ffffc80000000000 | -56 TB | ffffc8ffffffffff | 1 TB | ... unused hole > > + ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base) > > + ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole > > + ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base) > > + ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole > > + ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory > > + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole > > + | | | | vaddr_end for KASLR > > + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping > > + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | LDT remap for PTI > > + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks > > +__________________|____________|__________________|_________|____________________________________________________________ > > + | > > 56-bit: > > + | > > + | Kernel-space virtual memory, shared between all processes: > > +____________________________________________________________|___________________________________________________________ > > + | | | | > > + ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor > > + ff10000000000000 | -60 PB | ff8fffffffffffff | 32 PB | direct mapping of all physical memory (page_offset_base) > > + ff90000000000000 | -28 PB | ff9fffffffffffff | 4 PB | LDT remap for PTI > > + ffa0000000000000 | -24 PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base) > > + ffd2000000000000 | -11.5 PB | ffd3ffffffffffff | 0.5 PB | ... unused hole > > + ffd4000000000000 | -11 PB | ffd5ffffffffffff | 0.5 PB | virtual memory map (vmemmap_base) > > + ffd6000000000000 | -10.5 PB | ffdeffffffffffff | 2.25 PB | ... unused hole > > + ffdf000000000000 | -8.25 PB | fffffdffffffffff | ~8 PB | KASAN shadow memory > > + fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole > > + | | | | vaddr_end for KASLR > > + fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping > > + fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole > > + ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks > > The two layouts are very similar beyond the shift in the offset and the region sizes, except > one big asymmetry: is the placement of the LDT remap for PTI. > > Is there any fundamental reason why the LDT area is mapped into a 4 petabyte (!) area on 56-bit > kernels, instead of being at the -1.5 TB offset like on 47-bit kernels? > > The only reason I can see is that this way is that it's currently coded at the PGD level only: > > static void map_ldt_struct_to_user(struct mm_struct *mm) > { > pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR); > > if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt) > set_pgd(kernel_to_user_pgdp(pgd), *pgd); > } > > ( BTW., the 4 petabyte size of the area is misleading: a 5-level PGD entry covers 256 TB of > virtual memory, i.e 0.25 PB, not 4 PB. So in reality we have a 0.25 PB area there, used up > by the LDT mapping in a single PGD entry, plus a 3.75 PB hole after that. ) > > ... but unless I'm missing something it's not really fundamental for it to be at the PGD level > - it could be two levels lower as well, and it could move back to the same place where it's on > the 47-bit kernel. > The subtlety is that, if it's lower than the PGD level, there end up being some tables that are private to each LDT-using mm that map things other than the LDT. Those tables cover the same address range as some corresponding tables in init_mm, and if those tables in init_mm change after the LDT mapping is set up, the changes won't propagate. So it probably could be made to work, but it would take some extra care.