On Thu, Oct 20, 2022 at 09:03:39AM +0200, Petr Mladek wrote: > On Wed 2022-10-19 14:00:55, Petr Pavlu wrote: > > On 10/18/22 20:33, Luis Chamberlain wrote: > > > On Sat, Oct 15, 2022 at 11:27:10AM +0200, Petr Pavlu wrote: > > >> The patch does address a regression observed after commit 6e6de3dee51a > > >> ("kernel/module.c: Only return -EEXIST for modules that have finished > > >> loading"). I guess it can have a Fixes tag added to the patch. > > >> > > >> I think it is hard to split this patch into parts because the implemented > > >> "optimization" is the fix. > > > > > > git describe --contains 6e6de3dee51a > > > v5.3-rc1~38^2~6 > > > > > > I'm a bit torn about this situation. Reverting 6e6de3dee51a would be the > > > right thing to do, but without it, it still leaves the issue reported > > > by Prarit Bhargava. We need a way to resolve the issue on stable and > > > then your optimizations can be applied on top. > > > > Simpler could be to do the following: > > > > diff --git a/kernel/module/main.c b/kernel/module/main.c > > index d02d39c7174e..0302ac387e93 100644 > > --- a/kernel/module/main.c > > +++ b/kernel/module/main.c > > @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) > > sched_annotate_sleep(); > > mutex_lock(&module_mutex); > > mod = find_module_all(name, strlen(name), true); > > - ret = !mod || mod->state == MODULE_STATE_LIVE; > > + ret = !mod || mod->state == MODULE_STATE_LIVE > > + || mod->state == MODULE_STATE_GOING; > > mutex_unlock(&module_mutex); > > > > return ret; > > @@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod) > > mutex_lock(&module_mutex); > > old = find_module_all(mod->name, strlen(mod->name), true); > > if (old != NULL) { > > - if (old->state != MODULE_STATE_LIVE) { > > + if (old->state == MODULE_STATE_COMING > > + || old->state == MODULE_STATE_UNFORMED) { > > /* Wait in case it fails to load. */ > > mutex_unlock(&module_mutex); > > err = wait_event_interruptible(module_wq, > > @@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod) > > goto out_unlocked; > > goto again; > > } > > - err = -EEXIST; > > + err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST; > > > > goto out; > > } > > mod_update_bounds(mod); > > > > This is an alternative approach to fix the issue that 6e6de3dee51a addressed > > and it preserves the previous handling of same-module parallel loads. > > > > It works well in practice but a problem is that this previous handling is > > somewhat fragile because it requires specific timings. A second load of a same > > module returns EBUSY only if it observes the first load in the going state. > > > > The following can then happen: > > * A first load of module A is requested. It passes add_unformed_module() and > > proceeds with full initialization. > > * A second load of module A arrives. It proceeds up to add_unformed_module() > > where it waits on the first module to complete its initialization. > > * The first load fails because its init function happens to produce an error. > > The cleanup code in do_init_module() unlinks the module from the modules > > list, frees the module and finally calls wake_up_all(&module_wq). > > * The second load gets woken up. It sees that there is no module with the same > > name in the modules list and continues with its full initialization, which > > likely again fails in the init function. > > Another solution would be to add one more reference counter directly > into struct module. The existing counter is about dependencies on the > module. It forces the module to stay in MODULE_STATE_LIVE when there > is some dependency. The new reference counter would be just about > life time of struct module. > > It should be easier than to add new structure for passing err code. > > Also it would allow to remove the racy finished_loading(). > wait_event_interruptible() could just check mod->state. Sounds good, but let us just keep in mind we *first* want a fix for stable, which also fixes 6e6de3dee51a and addresses the fix it intended to have. So I welcome patches, let us first get a small fix in for 6e6de3dee51a and we can optimize away after. Luis