On Wed, Feb 21, 2024 at 09:40:48AM -0800, Luis Chamberlain wrote: > + live-patching folks, > > Finally, things are starting to be much clearer. Thanks for the time > for working on this, some more comments below and a question which > I think deserves some attention. > > On Sat, Feb 17, 2024 at 04:18:10PM +0800, Changbin Du wrote: > > The synchronization here is just to ensure the module init's been freed > > before doing W+X checking. > > Some nits, this should read instead: > > Fix the ordering of freeing of a module init so that it happens before > W+X checking. > > > But the commit 1a7b7d922081 ("modules: Use > > vmalloc special flag") moves do_free_init() into a global workqueue > > instead of call_rcu(). So now rcu_barrier() can not ensure that do_free_init > > has completed. We should wait it via flush_work(). > > Remove "But" and adjust as: > > Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved > calling do_free_init() into a global workqueue instead of relying on it > being called through call_rcu(..., do_free_init), which used to allowed us > call do_free_init() asynchronously after the end of a subsequent grace > period. The move to a global workqueue broke the gaurantees for code > which needed to be sure the do_free_init() would complete with rcu_barrier(). > To fix this callers which used to rely on rcu_barrier() must now instead > use flush_work(&init_free_wq). > Sure, thanks! > > Without this fix, we still could encounter false positive reports in > > W+X checking, > > This is good thanks for the clarification. > > I think it would be useful for the commit log then to describe also that > it is not that the freeing was not happening, it is just that our sanity > checkers raced against the permission checkers which assume init memory > is already gone. > okay, I'll apend this detailed explanation. > > and the rcu synchronization is unnecessary which can > > introduce significant delay. > > While this can be true, I am not sure if we can remove it. See below. > > > Eric Chanudet reports that the rcu_barrier introduces ~0.1s delay on a > > PREEMPT_RT kernel. > > That's a separate issue. > > > [ 0.291444] Freeing unused kernel memory: 5568K > > [ 0.402442] Run /sbin/init as init process > > > > With this fix, the above delay can be eliminated. > > > > Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag") > > Signed-off-by: Changbin Du <changbin.du@xxxxxxxxxx> > > Cc: Xiaoyi Su <suxiaoyi@xxxxxxxxxx> > > Cc: Eric Chanudet <echanude@xxxxxxxxxx> > > > > --- > > v3: > > - amend comment in do_init_module() and update commit msg. > > v2: > > - fix compilation issue for no CONFIG_MODULES found by 0-DAY. > > --- > > include/linux/moduleloader.h | 8 ++++++++ > > init/main.c | 5 +++-- > > kernel/module/main.c | 9 +++++++-- > > 3 files changed, 18 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h > > index 001b2ce83832..89b1e0ed9811 100644 > > --- a/include/linux/moduleloader.h > > +++ b/include/linux/moduleloader.h > > @@ -115,6 +115,14 @@ int module_finalize(const Elf_Ehdr *hdr, > > const Elf_Shdr *sechdrs, > > struct module *mod); > > > > +#ifdef CONFIG_MODULES > > +void flush_module_init_free_work(void); > > +#else > > +static inline void flush_module_init_free_work(void) > > +{ > > +} > > +#endif > > + > > /* Any cleanup needed when module leaves. */ > > void module_arch_cleanup(struct module *mod); > > > > diff --git a/init/main.c b/init/main.c > > index e24b0780fdff..f0b7e21ac67f 100644 > > --- a/init/main.c > > +++ b/init/main.c > > @@ -99,6 +99,7 @@ > > #include <linux/init_syscalls.h> > > #include <linux/stackdepot.h> > > #include <linux/randomize_kstack.h> > > +#include <linux/moduleloader.h> > > #include <net/net_namespace.h> > > > > #include <asm/io.h> > > @@ -1402,11 +1403,11 @@ static void mark_readonly(void) > > if (rodata_enabled) { > > /* > > * load_module() results in W+X mappings, which are cleaned > > - * up with call_rcu(). Let's make sure that queued work is > > + * up with init_free_wq. Let's make sure that queued work is > > * flushed so that we don't hit false positives looking for > > * insecure pages which are W+X. > > */ > > - rcu_barrier(); > > Was this the only source of waiters that used rcu_barrier() to sync ? > What about kallsyms, live-patching ? > > This original source to the addition of this rcu_barrier() (in a slight > older modified form with with rcu_barrier_sched()) was commit > ae646f0b9ca13 ("init: fix false positives in W+X checking") since > v4.17 in 2018, 6 years ago. So I'm hoping we don't have any other > side-by new users which have grown dependent on this rcu_barrier() for > other call_rcu()'s they may have used, but it is hard to tell. > Per the condtion 'rodata_enabled' and comments, I think the rcu_barrier() is only used to synchronize with freeing module init memory. > So while I agree that flush work is the right solution, removing the > rcu_barrier() is technically another change which could potentially > regress for other reasons now. It is perhaps safe, but I'm used to > surprises for minor changes like these. So I think it makes sense to > lift it now, and test it in the wild to see what could possibly break, > I'd much prefer to split this as two separate commits. One which does > the fix, and another that lifts the rcu_barrier() with the stated > rationale and savings on time of ~0.1s on PREEMPT_RT kernels. > But the only change in patch is to replace rcu_barrier() with flush_module_init_free_work(). Do you mean that keep both flush_module_init_free_work() and rcu_barrier() here? It sounds a little bit weird IMHO. > Luis -- Cheers, Changbin Du