Re: [RFC v2 3/4] modules, x86: use vmalloc_exec for module core

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Oct 14, 2022, at 8:42 AM, Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote:
> 
> On Fri, 2022-10-07 at 16:43 -0700, Song Liu wrote:
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index a4e4d84b6f4e..b44806e31a56 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -53,6 +53,7 @@
>> #include <linux/bsearch.h>
>> #include <linux/dynamic_debug.h>
>> #include <linux/audit.h>
>> +#include <linux/bpf.h>
>> #include <uapi/linux/module.h>
>> #include "internal.h"
>> 
>> @@ -1203,7 +1204,7 @@ static void free_module(struct module *mod)
>>        lockdep_free_key_range(mod->data_layout.base, mod-
>>> data_layout.size);
>> 
>>        /* Finally, free the core (containing the module structure)
>> */
>> -       module_memfree(mod->core_layout.base);
>> +       vfree_exec(mod->core_layout.base);
>> #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>>        vfree(mod->data_layout.base);
>> #endif
>> @@ -1321,7 +1322,8 @@ static int simplify_symbols(struct module *mod,
>> const struct load_info *info)
>>                        ksym = resolve_symbol_wait(mod, info, name);
>>                        /* Ok if resolved.  */
>>                        if (ksym && !IS_ERR(ksym)) {
>> -                               sym[i].st_value =
>> kernel_symbol_value(ksym);
>> +                               unsigned long val =
>> kernel_symbol_value(ksym);
>> +                               bpf_arch_text_copy(&sym[i].st_value,
>> &val, sizeof(val));
> 
> Why bpf_arch_text_copy()? This of course won't work for other
> architectures. So there needs to be fallback method. That RFC broke the
> operation into two stages: Loading and finalized. When loading, on non-
> x86 the writes would simply be to the allocation mapped as writable.
> When it was finalized it changed it to it's final permission (RO, etc).
> Then for x86 it does text_pokes() for the writes and has it RO from the
> beginning.

Yeah, this one (3/4) is really a prototype to show vmalloc_exec could 
work for modules (with a lot more work of course). And something to
replace bpf_arch_text_copy() is one of the issues we need to address in
the future. 

> 
> I ended up needing a staging buffer for modules too, so that the code
> could operate on it directly. I can't remember why that was, it might
> be unneeded now since you moved data out of the core allocation.

Both bpf_jit and bpf_dispather uses a staging buffer with bpf_prog_pack. 
The benefit of this approach is that it minimizes the number of 
text_poke/copy() calls. OTOH, it is quite a pain to make all the 
relative calls correct, as the staging buffer has different address to 
the final allocation. 

I think we may not need the staging buffer for modules, as module 
load/unload happens less often than BPF program JITs (so it is ok for 
it to be slightly slower). 

btw: I cannot take credit for split module data out of core allocation,
Christophe Leroy did the work. :)

Thanks,
Song




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux