Hi Luis, Thanks for your kind review! On Wed, Jan 25, 2023 at 10:06 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > [...] > > > > MOD_MEM_TYPE_TEXT, > > MOD_MEM_TYPE_DATA, > > MOD_MEM_TYPE_RODATA, > > MOD_MEM_TYPE_RO_AFTER_INIT, > > MOD_MEM_TYPE_INIT_TEXT, > > MOD_MEM_TYPE_INIT_DATA, > > MOD_MEM_TYPE_INIT_RODATA, > > > > and allocating them separately. > > First thanks for doing this work! > > This seems to not acknolwedge the original goal of the first module_layout and > the latched rb-tree use, and that was was for speeding up __module_address() > since it *can* even be triggered on NMIs. I say this because the first question > that comes to me is the impact to performance on __module_address() I can't > see that changing much here, but mention it as it similar consideration > should be made in case future changes modify this path. To make sure I understand this correctly. Do you mean we need something like the following in the commit log? """ This adds slightly more entries to mod_tree (from up to 3 entries per module, to up to 7 entries per module). However, this at most adds a small constant overhead to __module_address(), which is expected to be fast. """ > > Microbenching something so trivial as __module_address() may not be as useful > for an idle system, at the very least being able to compare before and after > even on idle may be useful *if* you eventually do some more radical changes > here. Modules-related kernel/kallsyms_selftest.c did that for kallsyms_lookup_name() > and friend just recently for a minor performance enhancement. kernel/kallsyms_selftest.c is new to me. I will give it a try. > > At a high level it is perhaps my only conern so far while reviewing this patch, > and I am glad it is now clear that addressing modules is a requirement. The rest > seems like a highly welcomed cleanup even though the diffstat does not reflect > that. > > > Various archs use module_layout for different data. These data are put > > into different module_memory based on their location in module_layout. > > IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT; > > data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc. > > I think the commit log should document a bit the rationale for why > ARCH_WANTS_MODULES_DATA_IN_VMALLOC gets *can* tidied up so well here. Great point! Will add this in v3. > > > Signed-off-by: Song Liu <song@xxxxxxxxxx> > > Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > > --- > > > > This is the preparation work for the type aware module_alloc() discussed > > in [1]. While this work is not covered much in the discussion, it is a > > critical step of the effort. > > > > As this part grows pretty big (~1000 lines, + and -), I would like get > > some feedback on it, so that I know it is on the right track. > > > > Please share your comments. Thanks! > > > > Test coverage: Tested on x86_64. > > I will likely merge this onto modules-next soon, not because I think it is > ready, but just because I think it *is* mostly ready and the next thing > we need is exposure and testing. rc5 is pretty late to consider this > for v6.3 and so hopefully for this cycle we can at least settle on > something which will sit in linux-next since the respective linux-next > after v6.3-rc1 is released. Yes, this definitely needs more tests. Given different archs use module_layout in all sorts of ways. I will be very surprised if I updated all them correctly (though I tried hard to). > > > Build tested by kernel test bot in [2]. The only regression in [2] was a > > typo in parisc, which is also fixed. > > > > [1] https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@xxxxxxxxxx/T/#u > > You still never addressed my performance suggestions so don't be > surprised if I insist later. Yes you can use existing performance > benchmarks, specially now with modules as a hard requirement, to > show gains. So I'd like to clarify that if I'm reviewing late it is > because: > > a) my modules patch review queue has been high as of late > b) you seem to not have taken these performance suggestions into consideration > before and so I tend to put it at my end of my queue for review. I think it will be a lot easier to run performance tests with the module support. Let's see what we can do when we get to the performance test part. [...] > > > @@ -1389,16 +1466,29 @@ unsigned int __weak arch_mod_section_prepend(struct module *mod, > > return 0; > > } > > > > -/* Update size with this section: return offset. */ > > -long module_get_offset(struct module *mod, unsigned int *size, > > - Elf_Shdr *sechdr, unsigned int section) > > +/* > > + * Use highest 4 bits of sh_entsize to store the mod_mem_type of this > > + * section. This leaves 28 bits for offset on 32-bit systems, which is > > + * about 256 MiB (WARN_ON_ONCE if we exceed that). > > + */ > > + > > +#define SH_ENTSIZE_TYPE_BITS 4 > > +#define SH_ENTSIZE_TYPE_SHIFT (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS) > > +#define SH_ENTSIZE_TYPE_MASK ((1UL << SH_ENTSIZE_TYPE_BITS) - 1) > > +#define SH_ENTSIZE_OFFSET_MASK ((1UL << (BITS_PER_LONG - SH_ENTSIZE_TYPE_BITS)) - 1) > > + > > +long module_get_offset_and_type(struct module *mod, enum mod_mem_type type, > > + Elf_Shdr *sechdr, unsigned int section) > > { > > - long ret; > > + long offset; > > + long mask = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK) << SH_ENTSIZE_TYPE_SHIFT; > > > > - *size += arch_mod_section_prepend(mod, section); > > - ret = ALIGN(*size, sechdr->sh_addralign ?: 1); > > - *size = ret + sechdr->sh_size; > > - return ret; > > + mod->mod_mem[type].size += arch_mod_section_prepend(mod, section); > > + offset = ALIGN(mod->mod_mem[type].size, sechdr->sh_addralign ?: 1); > > + mod->mod_mem[type].size = offset + sechdr->sh_size; > > + > > + WARN_ON_ONCE(offset & mask); > > + return offset | mask; > > } > > Could you split this somehow with the mask stuff up into its own > non-functional patch first somehow to make this easier to review? > > No worries if you can't. This is not very straightforward with INIT_OFFSET_MASK. So I will keep it as-is for now. Btw, while looking into this, I found a leftover INIT_OFFSET_MASK that I will remove in v3. > > > @@ -2117,74 +2193,45 @@ static int find_module_sections(struct module *mod, struct load_info *info) > > > > static int move_module(struct module *mod, struct load_info *info) > > { > > + enum mod_mem_type type; > > int i; > > void *ptr; > > > > - /* Do the allocs. */ > > - ptr = module_alloc(mod->core_layout.size); > > - /* > > - * The pointer to this block is stored in the module structure > > - * which is inside the block. Just mark it as not being a > > - * leak. > > - */ > > - kmemleak_not_leak(ptr); > > - if (!ptr) > > - return -ENOMEM; > > - > > - memset(ptr, 0, mod->core_layout.size); > > - mod->core_layout.base = ptr; > > + for_each_mod_mem_type(type) { > > + if (!mod->mod_mem[type].size) { > > + mod->mod_mem[type].base = NULL; > > + continue; > > + } > > + mod->mod_mem[type].size = PAGE_ALIGN(mod->mod_mem[type].size); > > + if (mod_mem_use_vmalloc(type)) > > + ptr = vzalloc(mod->mod_mem[type].size); > > + else > > + ptr = module_alloc(mod->mod_mem[type].size); > > This form to check for mod_mem_use_vmalloc() is used twice, how about > a helper to just do it one line for us? Good idea! Will update in v3. > > > @@ -2195,6 +2242,14 @@ static int move_module(struct module *mod, struct load_info *info) > > } > > > > return 0; > > +out_enomem: > > + for (type--; type >= 0; type--) { > > + if (mod_mem_use_vmalloc(type)) > > + vfree(mod->mod_mem[type].base); > > + else > > + module_memfree(mod->mod_mem[type].base); > > + } > > + return -ENOMEM; > > } > > Here's the other user. > > > diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c > > This was the hardest file to review and so I ask the same, if a separate > non-functional change could be done first to make the changes easier > to review. AFAICT, strict_rwx.c is pretty straightforward after the change. And it is just ~100 LOC. The non-functional change is very likely to be more complicated. So I will keep this part as-is. > > No worries if not. > > Reviewed-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> I will fix these, rebase, and send v3. Thanks again! Song