On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote: > On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > > > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote: > > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > > > [...] > > > > > > > > > > Can you give more detail on what parameters you need? If the only extra > > > > > > > parameter is just "does this allocation need to live close to kernel > > > > > > > text", that's not that big of a deal. > > > > > > > > > > > > My thinking was that we at least need the start + end for each caller. That > > > > > > might be it, tbh. > > > > > > > > > > Do you mean that modules will have something like > > > > > > > > > > jit_text_alloc(size, MODULES_START, MODULES_END); > > > > > > > > > > and kprobes will have > > > > > > > > > > jit_text_alloc(size, KPROBES_START, KPROBES_END); > > > > > ? > > > > > > > > Yes. > > > > > > How about we start with two APIs: > > > jit_text_alloc(size); > > > jit_text_alloc_range(size, start, end); > > > > > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am > > > not quite convinced it is needed. > > > > Right now arm64 and riscv override bpf and kprobes allocations to use the > > entire vmalloc address space, but having the ability to allocate generated > > code outside of modules area may be useful for other architectures. > > > > Still the start + end for the callers feels backwards to me because the > > callers do not define the ranges, but rather the architectures, so we still > > need a way for architectures to define how they want allocate memory for > > the generated code. > > Yeah, this makes sense. > > > > > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by > > > > > adding enum jit_type parameter to jit_text_alloc(). > > > > > > > > That feels backwards to me; it centralizes a bunch of information about > > > > distinct users to be able to shove that into a static array, when the callsites > > > > can pass that information. > > > > > > I think we only two type of users: module and everything else (ftrace, kprobe, > > > bpf stuff). The key differences are: > > > > > > 1. module uses text and data; while everything else only uses text. > > > 2. module code is generated by the compiler, and thus has stronger > > > requirements in address ranges; everything else are generated via some > > > JIT or manual written assembly, so they are more flexible with address > > > ranges (in JIT, we can avoid using instructions that requires a specific > > > address range). > > > > > > The next question is, can we have the two types of users share the same > > > address ranges? If not, we can reserve the preferred range for modules, > > > and let everything else use the other range. I don't see reasons to further > > > separate users in the "everything else" group. > > > > I agree that we can define only two types: modules and everything else and > > let the architectures define if they need different ranges for these two > > types, or want the same range for everything. > > > > With only two types we can have two API calls for alloc, and a single > > structure that defines the ranges etc from the architecture side rather > > than spread all over. > > > > Like something along these lines: > > > > struct execmem_range { > > unsigned long start; > > unsigned long end; > > unsigned long fallback_start; > > unsigned long fallback_end; > > pgprot_t pgprot; > > unsigned int alignment; > > }; > > > > struct execmem_modules_range { > > enum execmem_module_flags flags; > > struct execmem_range text; > > struct execmem_range data; > > }; > > > > struct execmem_jit_range { > > struct execmem_range text; > > }; > > > > struct execmem_params { > > struct execmem_modules_range modules; > > struct execmem_jit_range jit; > > }; > > > > struct execmem_params *execmem_arch_params(void); > > > > void *execmem_text_alloc(size_t size); > > void *execmem_data_alloc(size_t size); > > void execmem_free(void *ptr); > > With the jit variation, maybe we can just call these > module_[text|data]_alloc()? I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel image, like modules, ftrace on x86 and s390 and maybe something else in the future. And jit_text_alloc() for allocations that can reside anywhere. I tried to find a different name for 'struct execmem_modules_range' but couldn't think of anything better than 'struct execmem_close_to_kernel', so I've left modules in the name. > btw: Depending on the implementation of the allocator, we may also > need separate free()s for text and data. > > > > > void *jit_text_alloc(size_t size); > > void jit_free(void *ptr); > > Let's just add jit_free() for completeness even if it will be the same as execmem_free() for now. > [...] > > How should we move ahead from here? > > AFAICT, all these changes can be easily extended and refactored > in the future, so we don't have to make it perfect the first time. > OTOH, having the interface committed (either this set or my > module_alloc_type version) can unblock works in the binpack > allocator and the users side. Therefore, I think we can move > relatively fast here? Once the interface and architecture abstraction is ready we can work on the allocator and the users. We also need to update text_poking/alternatives on architectures that would allocate executable memory as ROX. I did some quick tests and with these patches 'modprobe xfs' takes tens time more than before. > Thanks, > Song -- Sincerely yours, Mike.