On Wed, Jul 13, 2022 at 10:16:36PM -0700, Christoph Hellwig wrote: > On Wed, Jul 13, 2022 at 12:20:09PM +0200, Peter Zijlstra wrote: > > Start by adding VM_TOPDOWN_VMAP, which instead of returning the lowest > > (leftmost) vmap_area that fits, picks the higests (rightmost). > > > > Then add module_alloc_data() that uses VM_TOPDOWN_VMAP and make > > ARCH_WANTS_MODULE_DATA_IN_VMALLOC use that instead of vmalloc (with a > > weak function doing the vmalloc). > > > > This gets you bottom of module range is RO+X only, top is shattered > > between different !X types. > > > > Then track the boundary between X and !X and ensure module_alloc_data() > > and module_alloc() never cross over and stay strictly separated. > > > > Then change all module_alloc() users to expect RO+X memory, instead of > > RW. > > > > Then make sure any extention of the X range is 2M aligned. > > > > And presto, *everybody* always uses 2M TLB for text, modules, bpf, > > ftrace, the lot and nobody is tracking chunks. > > > > Maybe migration can be eased by instead providing module_alloc_text() > > and ARCH_WANTS_MODULE_ALLOC_TEXT. > > This all looks pretty sensible. How are we going to do the initial > write to the executable memory, though? With something like text_poke_memcpy(). I suppose that the proposed ARCH_WANTS_MODULE_ALLOC_TEXT needs to imply availability of that too. If the 4K copy thing ends up being a bottleneck we can easily extend that to have a 2M option as well.