On Fri, May 26, 2023 at 05:03:18PM -0700, Song Liu wrote: > On Fri, May 26, 2023 at 4:39 PM Kent Overstreet > <kent.overstreet@xxxxxxxxx> wrote: > > > [...] > > > > > But it should be an internal implementation detail, I don't think we > > want the external interface tied to vmalloc - > > > > > These two APIs allow the core code work with all archs. They won't break > > > sub-page allocations. (not all archs will have sub-page allocations) > > > > So yes, text_poke() doesn't work on all architectures, and we need a > > fallback. > > > > But if the fallback needs to go the unprotect/protect route, I think we > > need to put that in the helper, and not expose it. Part of what people > > are wanting is to limit or eliminate pages that are RWX, so we > > definitely shouldn't be creating new interfaces to flipping page > > permissions: that should be pushed down to as low a level as possible. > > > > E.g., with my jitalloc_update(), a more complete version would look like > > > > void jitalloc_update(void *dst, void *src, size_t len) > > { > > if (text_poke_available) { > > text_poke(...); > > } else { > > unprotect(); > > memcpy(); > > protect(); > > } > > } > > I think this doesn't work for sub page allocation? Perhaps I elided too much - it does if you add a single lock. You can't do that if it's not a common helper. > At the end of all this, we will have modules running from huge pages, data > and text. It will give significant performance benefit when some key driver > cannot be compiled into the kernel. Yeah, I've seen the numbers for the perf impact of running as a module due to the extra TLB overhead - but Mike's recent data was showing that this doesn't matter nearly as much as data as it does for text. > > Also - module_memory_fill_type(), module_memory_invalidate_type() - I > > know these are for BPF, but could you explain why we need them in the > > external interface here? Could they perhaps be small helpers in the bpf > > code that use something like jitalloc_update()? > > These are used by all users, not just BPF. 1/3 uses them in > kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I can > use them instead of text_poke_* (and that is probably better code style). > As I am writing the email, I think I should use it in ftrace.c (2/3) to avoid > the __weak function. Ok. Could we make it clearer why they're needed and what they're for? I know bpf fills in invalid instructions initially; I didn't read through enough code to find the "why", and let's document that to save other people the same effort. And do they really need to be callbacks in mod_alloc_params...? Do we need the other things in mod_alloc_params/vmalloc_params? - your granularity field says it's for specifying PAGE/PMD size: we definitely do not want that. We've had way too much code along the lines of "implement hugepages for x", we need to stop doing that. It's an internal mm/ detail. - vmalloc_params: we need gfp_t and pgprot_t, but those should be normal arguments. start/end/vm_flags? They seem like historical module baggage, can we get rid of them? > > > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text). > > > We can probably use it (with some updates) instead of the vmap_area > > > based allocator. But I guess we need to align the direction first. > > > > Let's see if we can get tglx to chime in again, since he was pretty > > vocal in the last discussion. > > > > It's also good practice to try to summarize all the relevant "whys" from > > the discussion that went into a feature and put that in at least the > > commit message - or even better, header file comments. > > > > Also, organization: the module code is a huge mess, let's _please_ do > > split this out and think about organization a bit more, not add to the > > mess :) > > I don't really think module code is very messy at the moment. If > necessary, we can put module_alloc_type related code in a > separate file. Hey, it's been organized since I last looked at it :) But I tihink this belongs in mm/, not module code.