Re: [PATCH] static_call: Handle module init failure correctly in static_call_del_module()

Daniel Gomez <da.gomez@xxxxxxxxxxx> · Fri, 8 Nov 2024 17:22:06 +0100

On Fri Nov 8, 2024 at 5:09 PM CET, Daniel Gomez wrote:
> On Fri Nov 8, 2024 at 4:49 PM CET, Luis Chamberlain wrote:
>> + Other new module maintainers
>>
>> On Fri, Nov 08, 2024 at 09:12:03AM +0100, Christophe Leroy wrote:
>>> Hi Luis,
>>> 
>>> Le 24/09/2024 à 09:22, Mike Rapoport a écrit :
>>> > On Thu, Sep 19, 2024 at 02:53:34AM -0700, Luis Chamberlain wrote:
>>> > > On Fri, Sep 06, 2024 at 04:24:56PM -0700, Luis Chamberlain wrote:
>>> > > > On Thu, Sep 05, 2024 at 11:44:00AM +0200, Thomas Gleixner wrote:
>>> > > > > Now you at least provided the information that the missing cleanup in
>>> > > > > the init() function is not the problem. So the obvious place to look is
>>> > > > > in the module core code whether there is a failure path _after_
>>> > > > > module->init() returned success.
>>> > > > > 
>>> > > > > do_init_module()
>>> > > > >          ret = do_one_initcall(mod->init);
>>> > > > >          ...
>>> > > > > 	ret = module_enable_rodata_ro(mod, true);
>>> > > > > 	if (ret)
>>> > > > > 		goto fail_mutex_unlock;
>>> > > > > 
>>> > > > > and that error path does _not_ invoke module->exit(), which is obviously
>>> > > > > not correct. Luis?
>>> > > > 
>>> > > > You're spot on this needs fixing.
>>> > > 
>>> > > Christophe, this is a regression caused by the second hunk of your commit
>>> > > d1909c0221739 ("module: Don't ignore errors from set_memory_XX()") on v6.9.
>>> > > Sadly there are a few issues with trying to get to call mod->exit():
>>> > > 
>>> > > - We should try try_stop_module()  and that can fail
>>> > > - source_list may not be empty and that would block removal
>>> > > - mod->exit may not exist
>>> > > 
>>> > > I'm wondering if instead we should try to do the module_enable_rodata_ro()
>>> > > before the init, but that requires a bit more careful evaluation...
>>> > 
>>> > There is ro_after_init section, we can't really make it RO before ->init()
>>> 
>>> Surprisingly I never received Luis's email
>>
>> So odd..
>>
>>> allthough I got this answer from Mike that I overlooked.
>>> 
>>> So coming back here from
>>> https://lore.kernel.org/all/ZyQhbHxDTRXTJgIx@xxxxxxxxxxxxxxxxxxxxxx/
>>> 
>>> As far as I understand, indeed once init is called it is too late to fail,
>>
>> Partly yes, party no. Party yes in that its a can of worms we have not
>> had to deal with before, and also I worry about deadlocks, and the code
>> to address this seems complex. right ?
>
> I have a RFC ready with this, I'll send this now so we can discuss on
> with a proposal.
>
>>
>>
>>> Especially when the module has no exit() path or when
>>> CONFIG_MODULE_UNLOAD is not built in.
>>
>> That's exactly the other extreme case I fear for.
>>
>>> So the only thing we can do then is a big fat warning telling
>>> set_memory_ro() on ro_after_init memory has failed ?
>>
>> I suspect this is more sensible to do.
>
> I came to the same conclusion while trying to fix this path. + I added
> an alternative for discussion.
>
>>
>>> Maybe we should try and change it to RO then back to RW before calling init,
>>> to be on a safer side hopping that if change to RO works once it will work
>>> twice ?
>>
>> That's another approach wich could work, if we proove that this does
>> work, it's a nice best effort and I think less or a mess to the codebase
>> then special-casing the error handling of trying to deal with the
>> driver's exit.
>>
>> Daniel Gomez has been looking at this, so his feedback here would be
>> valuable.
>
> What if we detect ro_after_init first, and block any module
> initialization depending on this ro_after_init to actually start loading
> it? That way we can stop and unload the module successfully.

In case I'm missing someone, I've just sent the RFC here:

https://lore.kernel.org/linux-modules/20241108-modules-ro_after_init-v3-0-6dd041b588a5@xxxxxxxxxxx/T/#t

Please ignore "v3" prefix. That was a mistake. Not sure why b4 added
that.

>
>>
>>   Luis