> On Oct 14, 2022, at 8:42 AM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote: > > On Fri, 2022-10-07 at 16:43 -0700, Song Liu wrote: >> diff --git a/kernel/module/main.c b/kernel/module/main.c >> index a4e4d84b6f4e..b44806e31a56 100644 >> --- a/kernel/module/main.c >> +++ b/kernel/module/main.c >> @@ -53,6 +53,7 @@ >> #include <linux/bsearch.h> >> #include <linux/dynamic_debug.h> >> #include <linux/audit.h> >> +#include <linux/bpf.h> >> #include <uapi/linux/module.h> >> #include "internal.h" >> >> @@ -1203,7 +1204,7 @@ static void free_module(struct module *mod) >> lockdep_free_key_range(mod->data_layout.base, mod- >>> data_layout.size); >> >> /* Finally, free the core (containing the module structure) >> */ >> - module_memfree(mod->core_layout.base); >> + vfree_exec(mod->core_layout.base); >> #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC >> vfree(mod->data_layout.base); >> #endif >> @@ -1321,7 +1322,8 @@ static int simplify_symbols(struct module *mod, >> const struct load_info *info) >> ksym = resolve_symbol_wait(mod, info, name); >> /* Ok if resolved. */ >> if (ksym && !IS_ERR(ksym)) { >> - sym[i].st_value = >> kernel_symbol_value(ksym); >> + unsigned long val = >> kernel_symbol_value(ksym); >> + bpf_arch_text_copy(&sym[i].st_value, >> &val, sizeof(val)); > > Why bpf_arch_text_copy()? This of course won't work for other > architectures. So there needs to be fallback method. That RFC broke the > operation into two stages: Loading and finalized. When loading, on non- > x86 the writes would simply be to the allocation mapped as writable. > When it was finalized it changed it to it's final permission (RO, etc). > Then for x86 it does text_pokes() for the writes and has it RO from the > beginning. Yeah, this one (3/4) is really a prototype to show vmalloc_exec could work for modules (with a lot more work of course). And something to replace bpf_arch_text_copy() is one of the issues we need to address in the future. > > I ended up needing a staging buffer for modules too, so that the code > could operate on it directly. I can't remember why that was, it might > be unneeded now since you moved data out of the core allocation. Both bpf_jit and bpf_dispather uses a staging buffer with bpf_prog_pack. The benefit of this approach is that it minimizes the number of text_poke/copy() calls. OTOH, it is quite a pain to make all the relative calls correct, as the staging buffer has different address to the final allocation. I think we may not need the staging buffer for modules, as module load/unload happens less often than BPF program JITs (so it is ok for it to be slightly slower). btw: I cannot take credit for split module data out of core allocation, Christophe Leroy did the work. :) Thanks, Song