Re: [PATCH bpf-next 1/3] mm/vmalloc: introduce vmalloc_exec which allocates RO+X memory

Song Liu <songliubraving@xxxxxx> · Fri, 5 Aug 2022 05:29:51 +0000

Hi Peter,

> On Jul 13, 2022, at 3:20 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> 

[...]

> 
> So how about instead we separate them? Then much of the problem goes
> away, you don't need to track these 2M chunks at all.
> 
> Start by adding VM_TOPDOWN_VMAP, which instead of returning the lowest
> (leftmost) vmap_area that fits, picks the higests (rightmost).
> 
> Then add module_alloc_data() that uses VM_TOPDOWN_VMAP and make
> ARCH_WANTS_MODULE_DATA_IN_VMALLOC use that instead of vmalloc (with a
> weak function doing the vmalloc).
> 
> This gets you bottom of module range is RO+X only, top is shattered
> between different !X types.
> 
> Then track the boundary between X and !X and ensure module_alloc_data()
> and module_alloc() never cross over and stay strictly separated.
> 
> Then change all module_alloc() users to expect RO+X memory, instead of
> RW.
> 
> Then make sure any extention of the X range is 2M aligned.
> 
> And presto, *everybody* always uses 2M TLB for text, modules, bpf,
> ftrace, the lot and nobody is tracking chunks.
> 
> Maybe migration can be eased by instead providing module_alloc_text()
> and ARCH_WANTS_MODULE_ALLOC_TEXT.

I finally got some time to look into the code. A few questions:

1. AFAICT, vmap_area tree only works with PAGE_SIZE aligned addresses. 
   For the sharing to be more efficient, I think we need to go with
   smaller granularity. Will this work? Shall we pick a smaller 
   granularity, say 64 bytes? Or shall we go all the way to 1 byte?

2. I think we will need multiple vmap_area's sharing the same vm_struct. 
   Do we need to add refcount to vm_struct?

Thanks,
Song