On Wed, Nov 9, 2022 at 3:18 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > [...] > > > > > > The proposed execmem_alloc() looks to me very much tailored for x86 > > > to be > > > used as a replacement for module_alloc(). Some architectures have > > > module_alloc() that is quite different from the default or x86 > > > version, so > > > I'd expect at least some explanation how modules etc can use execmem_ > > > APIs > > > without breaking !x86 architectures. > > > > I think this is fair, but I think we should ask ask ourselves - how > > much should we do in one step? > > I think that at least we need an evidence that execmem_alloc() etc can be > actually used by modules/ftrace/kprobes. Luis said that RFC v2 didn't work > for him at all, so having a core MM API for code allocation that only works > with BPF on x86 seems not right to me. While using execmem_alloc() et. al. in module support is difficult, folks are making progress with it. For example, the prototype would be more difficult before CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC (introduced by Christophe). We also have other users that we can onboard soon: BPF trampoline on x86_64, BPF jit and trampoline on arm64, and maybe also on powerpc and s390. > > > For non-text_poke() architectures, the way you can make it work is have > > the API look like: > > execmem_alloc() <- Does the allocation, but necessarily usable yet > > execmem_write() <- Loads the mapping, doesn't work after finish() > > execmem_finish() <- Makes the mapping live (loaded, executable, ready) > > > > So for text_poke(): > > execmem_alloc() <- reserves the mapping > > execmem_write() <- text_pokes() to the mapping > > execmem_finish() <- does nothing > > > > And non-text_poke(): > > execmem_alloc() <- Allocates a regular RW vmalloc allocation > > execmem_write() <- Writes normally to it > > execmem_finish() <- does set_memory_ro()/set_memory_x() on it > > > > Non-text_poke() only gets the benefits of centralized logic, but the > > interface works for both. This is pretty much what the perm_alloc() RFC > > did to make it work with other arch's and modules. But to fit with the > > existing modules code (which is actually spread all over) and also > > handle RO sections, it also needed some additional bells and whistles. > > I'm less concerned about non-text_poke() part, but rather about > restrictions where code and data can live on different architectures and > whether these restrictions won't lead to inability to use the centralized > logic on, say, arm64 and powerpc. > > For instance, if we use execmem_alloc() for modules, it means that data > sections should be allocated separately with plain vmalloc(). Will this > work universally? Or this will require special care with additional > complexity in the modules code? > > > So the question I'm trying to ask is, how much should we target for the > > next step? I first thought that this functionality was so intertwined, > > it would be too hard to do iteratively. So if we want to try > > iteratively, I'm ok if it doesn't solve everything. > > With execmem_alloc() as the first step I'm failing to see the large > picture. If we want to use it for modules, how will we allocate RO data? > with similar rodata_alloc() that uses yet another tree in vmalloc? > How the caching of large pages in vmalloc can be made useful for use cases > like secretmem and PKS? If RO data causes problems with direct map fragmentation, we can use similar logic. I think we will need another tree in vmalloc for this case. Since the logic will be mostly identical, I personally don't think adding another tree is a big overhead. Thanks, Song