Re: [PATCH] module: Add more error message for failed kernel module loading

Lucas De Marchi <lucas.de.marchi@xxxxxxxxx> · Tue, 1 Sep 2020 13:17:18 -0700

On Tue, Sep 1, 2020 at 12:56 PM Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
>
>
>
> On 9/1/20 2:50 PM, Lucas De Marchi wrote:
> > On Sat, Aug 29, 2020 at 4:15 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
> >>
> >> When kernel module loading failed, user space only get one of the
> >> following error messages:
> >> - -ENOEXEC
> >>   This is the most confusing one. From corrupted ELF header to bad
> >>   WRITE|EXEC flags check introduced by in module_enforce_rwx_sections()
> >>   all returns this error number.
> >>
> >> - -EPERM
> >>   This is for blacklisted modules. But mod doesn't do extra explain
> >>   on this error either.
> >>
> >> - -ENOMEM
> >>   The only error which needs no explain.
> >>
> >> This means, if a user got "Exec format error" from modprobe, it provides
> >> no meaningful way for the user to debug, and will take extra time
> >> communicating to get extra info.
> >>
> >> So this patch will add extra error messages for -ENOEXEC and -EPERM
> >> errors, allowing user to do better debugging and reporting.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> >> ---
> >>  kernel/module.c | 11 +++++++++--
> >>  1 file changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/module.c b/kernel/module.c
> >> index 1c5cff34d9f2..9f748c6eeb48 100644
> >> --- a/kernel/module.c
> >> +++ b/kernel/module.c
> >> @@ -2096,8 +2096,12 @@ static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
> >>         int i;
> >>
> >>         for (i = 0; i < hdr->e_shnum; i++) {
> >> -               if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
> >> +               if ((sechdrs[i].sh_flags & shf_wx) == shf_wx) {
> >> +                       pr_err(
> >> +                       "Module %s section %d has invalid WRITE|EXEC flags\n",
> >> +                               mod->name, i);
> >>                         return -ENOEXEC;
> >> +               }
> >>         }
> >>
> >>         return 0;
> >> @@ -3825,8 +3829,10 @@ static int load_module(struct load_info *info, const char __user *uargs,
> >>         char *after_dashes;
> >>
> >>         err = elf_header_check(info);
> >> -       if (err)
> >> +       if (err) {
> >> +               pr_err("Module has invalid ELF header\n");
> >>                 goto free_copy;
> >> +       }
> >>
> >>         err = setup_load_info(info, flags);
> >>         if (err)
> >> @@ -3834,6 +3840,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> >>
> >>         if (blacklisted(info->name)) {
> >>                 err = -EPERM;
> >> +               pr_err("Module %s is blacklisted\n", info->name);
> >
> > I wonder why would anyone actually add the blacklist to the command
> > line like this and have no
> > way to revert that back. This was introduced in
>
> Debug.  Debug.  Debug. ;)  The parameter was added to debug broken installations
> and kernel boots.
>
> > be7de5f91fdc modules: Add kernel parameter to blacklist modules
> > as a way to overcome broken initrd generation afaics.
>
> Not the generation of the initramfs, but blocking a module loaded during the
> boot.  The installation may have failed and there's no easy way to then debug
> what module was responsible for the failure.

if you are using initrd, then *inside* the initrd you should have the
/etc/modprobe.d/* file
you want. I said "broken initrd generation" because the tool should
put the file there, and
apparently for you it isn't.

Even if you don't have the file, you could use modprobe.blacklist= and
let the blacklist happen
in the userspace library rather than in the kernel. Module loading is
not like firmware loading
that happens without help from userspace.

>
>  Either kernel
> > command line (using modprobe.blacklist)
> > or /etc/modprobe.d options are honoured by libkmod and allow a
> > sufficiently privileged user to bypass it.
> >
>
> Both of those options only work if the filesystem is mounted IIRC.  It could be
> the case that modprobe.blacklist now works earlier in the boot, however, I've
> never used it because module_blacklist is the biggest and best hammer that I can
> use to get through a broken installation or boot.
>
> In any case you're incorrectly assuming that the system has a filesystem on it.
> That's not necessarily the case as I'll explain below.
>
> > +Rusty, +Prarit: is there anything this module parameter is covering
> > that I'm missing?
>
> This is the situation I have repeatedly come across :  A system at a remote site
> will not boot any flavor of Linux.  Since the system would not install the only
> way to debug was to provide install images to workaround a module load failure.
> As you can imagine that is a time consuming process as well as a bad end user
> experience.
>
> I got so sick of it that I wrote the code above (well similar to it anyway --
> thanks for the review Randy ;)) to add the module_blacklist parameter to make
> debugging an uninstallable "bricked" system easier.
>
> It's come in handy in some unexpected ways.  We've had situations where modules
> have loaded and corrupted memory and blacklisting them revealed that the modules
> were responsible for the memory corruption.

ok... but this seems a reimplementation of modprobe.blacklist= option
in the kernel command line,
but in kernel space, with no way to remove it after the kernel is booted.

Lucas De Marchi

>
> P.
>
> >
> > For the changes here,
> >
> > Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
> >
> > thanks
> > Lucas De Marchi
> >
> >>                 goto free_copy;
> >>         }
> >>
> >> --
> >> 2.27.0
> >>
> >
>