Re: [PATCH] module: Add more error message for failed kernel module loading

Prarit Bhargava <prarit@xxxxxxxxxx> · Tue, 1 Sep 2020 20:06:24 -0400

On 9/1/20 4:17 PM, Lucas De Marchi wrote:
> On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
>>
>>
>>
>> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
>>> On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>>>>
>>>> When kernel module loading failed, user space only get one of the
>>>> following error messages:
>>>> - -ENOEXEC
>>>>   This is the most confusing one. From corrupted ELF header to bad
>>>>   WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
>>>>   all returns this error number.
>>>>
>>>> - -EPERM
>>>>   This is for blacklisted modules. But mod doesn't do extra explain
>>>>   on this error either.
>>>>
>>>> - -ENOMEM
>>>>   The only error which needs no explain.
>>>>
>>>> This means, if a user got "Exec format error" from modprobe, it provides
>>>> no meaningful way for the user to debug, and will take extra time
>>>> communicating to get extra info.
>>>>
>>>> So this patch will add extra error messages for -ENOEXEC and -EPERM
>>>> errors, allowing user to do better debugging and reporting.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>>> ---
>>>>  kernel/module.c | 11 +++++++++--
>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/module.c b/kernel/module.c
>>>> index 1c5cff34d9f2..9f748c6eeb48 100644
>>>> --- a/kernel/module.c
>>>> +++ b/kernel/module.c
>>>> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>>>>         int i;
>>>>
>>>>         for (i = 0; i < hdr->e_shnum; i++) {
>>>> -               if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>>>> +               if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
>>>> +                       pr_err(
>>>> +                       "Module %s section %d has invalid WRITE|EXEC flags\n",
>>>> +                               mod->name, i);
>>>>                         return -ENOEXEC;
>>>> +               }
>>>>         }
>>>>
>>>>         return 0;
>>>> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>>         char *after_dashes;
>>>>
>>>>         err = elf_header_check(info);
>>>> -       if (err)
>>>> +       if (err) {
>>>> +               pr_err("Module has invalid ELF header\n");
>>>>                 goto free_copy;
>>>> +       }
>>>>
>>>>         err = setup_load_info(info, flags);
>>>>         if (err)
>>>> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>>>
>>>>         if (blacklisted(info->name)) {
>>>>                 err = -EPERM;
>>>> +               pr_err("Module %s is blacklisted\n", info->name);
>>>
>>> I wonder why would anyone actually add the blacklist to the command
>>> line like this and have no
>>> way to revert that back. This was introduced in
>>
>> Debug.  Debug.  Debug. ;)  The parameter was added to debug broken installations
>> and kernel boots.
>>
>>> be7de5f91fdc modules: Add kernel parameter to blacklist modules
>>> as a way to overcome broken initrd generation afaics.
>>
>> Not the generation of the initramfs, but blocking a module loaded during the
>> boot.  The installation may have failed and there's no easy way to then debug
>> what module was responsible for the failure.
> 
> if you are using initrd, then *inside* the initrd you should have the
> /etc/modprobe.d/* file
> you want. I said "broken initrd generation" because the tool should
> put the file there, and
> apparently for you it isn't.
> 
> Even if you don't have the file, you could use modprobe.blacklist= and
> let the blacklist happen
> in the userspace library rather than in the kernel. Module loading is
> not like firmware loading
> that happens without help from userspace.
> 
>>
>>  Either kernel
>>> command line (using modprobe.blacklist)
>>> or /etc/modprobe.d options are honoured by libkmod and allow a
>>> sufficiently privileged user to bypass it.
>>>
>>
>> Both of those options only work if the filesystem is mounted IIRC.  It could be
>> the case that modprobe.blacklist now works earlier in the boot, however, I've
>> never used it because module_blacklist is the biggest and best hammer that I can
>> use to get through a broken installation or boot.
>>
>> In any case you're incorrectly assuming that the system has a filesystem on it.
>> That's not necessarily the case as I'll explain below.
>>
>>> +Rusty, +Prarit: is there anything this module parameter is covering
>>> that I'm missing?
>>
>> This is the situation I have repeatedly come across :  A system at a remote site
>> will not boot any flavor of Linux.  Since the system would not install the only
>> way to debug was to provide install images to workaround a module load failure.
>> As you can imagine that is a time consuming process as well as a bad end user
>> experience.
>>
>> I got so sick of it that I wrote the code above (well similar to it anyway --
>> thanks for the review Randy ;)) to add the module_blacklist parameter to make
>> debugging an uninstallable "bricked" system easier.
>>
>> It's come in handy in some unexpected ways.  We've had situations where modules
>> have loaded and corrupted memory and blacklisting them revealed that the modules
>> were responsible for the memory corruption.
> 
> ok... but this seems a reimplementation of modprobe.blacklist= option
> in the kernel command line,
> but in kernel space, with no way to remove it after the kernel is booted.
>

That's *EXACTLY* what I want.  I do not want modprobe (because of some
misconfiguration) to load the module I've explicitly blacklisted.

P.

> Lucas De Marchi
> 
>>
>> P.
>>
>>>
>>> For the changes here,
>>>
>>> Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
>>>
>>> thanks
>>> Lucas De Marchi
>>>
>>>>                 goto free_copy;
>>>>         }
>>>>
>>>> --
>>>> 2.27.0
>>>>
>>>
>>
>