On Mon, Mar 05, 2018 at 05:34:57PM -0800, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] So meta :) The URL refers an lwn article, which in turn refers to this effort's first RFC. As someone only getting *one* of these patches in emails, It would be useful if the cover letter referenced instead an optional git tree and branch so one could easily get the patches for more careful inspection. In the meantime, can you make such tree available with a branch? Also, since kernel/module.c is involved it would be wise to include Jessica, which I've Cc'd. I'm going to guess Kees may like to review this too. Mimi might want to help review as well. Rafael may care about suspend/resume implications of these "umh modules" as you put it. > the request_module() > code is extended to allow user mode helpers to be invoked. Upon inspection you never touch or use request_module() so this is false and misleading. You don't use or extend request_module() at all, you rely on extending finit_module() so that load_module() itself will now execute (via modified do_execve_file()) the same file which was loaded as an module. This is *very* different given request_module() has its own full magic and is in and of itself a UMH of the kernel implemented in kernel/kmod.c. Nevertheless, why? > Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. It sounds like finit_module() module loading is used as a convenience factor simply to take advantage of being able to ship / maintain/ compile these umh programs as part of the kernel. Is that right? So the finit_module() interface was just a convenient mechanism. Is that right? Ie, if folks had these binaries in place the regular UMH interface / API could be used so that these could be looked for, but instead we want to carry these in tandem with the kernel? If so this still seems like an overly complex way to deal with this. > Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) OK so the use case envisioned here was for networking code to do something like: if (!loaded) { err = request_module("bpfilter"); ... } This is visible on your third patch (this is from your RFC, not this series): https://www.mail-archive.com/netfilter-devel@xxxxxxxxxxxxxxx/msg11129.html So indeed all this patch does in the end is just putting tons of wrappers in place so that kernel code can load certain trusted UMH programs we ship, and maintain in the kernel. request_module() has its own world though too. How often in your proof of concept is request_module() called? How many times do you envision it being called? Please review lib/test_kmod.c and tools/testing/selftests/kmod/ for testing your stuff too or consider extending appropriately. Are aliases something which you expect we'll need to support for these userspace... modules? > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Now, this sounds great, however I think that the proof of concept chosen is pretty complex to start off with. Even if its not designed to be a real world life use case, a much simpler proof of concept to do something more simple may be useful, if possible. One wouldn't need to to have it replace a kernel functionality in real life. lib/ is full of CONFIG_TEST_* examples, a simple new stupid kernel functionality which can in turn be replaced with a respective userspace counterpart may be useful, and both kconfig entries would be mutually exclusive. > Another > advantage coming with that would be that bpfilter.ko You mean foo.ko > can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). Great too. > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. I don't see how this is defining any boundary, I see just a loader for a userspace program, and re-using a kernel interface known, finit_module() which makes it convenient for us to load pre-compiled kernel junk. I'm still not convinced this is the right approach. > It's easy to distinguish "umh module" from traditional kernel module: Ah you said it, "umh module". I don't see what makes it a "umh module" so far, all we are doing is executing a userspace program a la UMH. But its a synchronous call, so we wait. The only modular thing here I see is we took the finit_module() to be able to loader programs compiled in the kernel. Its not using existing kernel symbols, its not exported symbols, etc. > $ readelf -h bld/net/bpfilter/bpfilter.ko|grep Type > Type: EXEC (Executable file) > $ readelf -h bld/net/ipv4/netfilter/iptable_filter.ko |grep Type > Type: REL (Relocatable file) > > Since umh can crash, can be oom-ed by the kernel, killed by admin, > the subsystem that uses them (like bpfilter) need to manage life > time of umh on its own, so module infra doesn't do any accounting > of them. Trying to avoid the "module infra" in theory sounds great, however you did extend it a bit. Also, the umh has its own infrastructure which I've slowly also been trying to groom and cleanup with a few wtf's still left in place and which if we don't take care, extending this use could backfire in other ways. Feedback below. > They don't appear in "lsmod" and cannot be "rmmod". > Multiple request_module("umh") will load multiple umh.ko processes. That also means we have a namespace collision between these programs and kernel modules. Does compilation warn if we hit a conflict here already? What about with aliases? > Similar to kernel modules the kernel will be tainted if "umh module" > has invalid signature. > > [1] https://lwn.net/Articles/747551/ > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > --- > fs/exec.c | 40 +++++++++++++++++++++++++++++++--------- > include/linux/binfmts.h | 1 + > include/linux/umh.h | 3 +++ > kernel/module.c | 43 ++++++++++++++++++++++++++++++++++++++----- > kernel/umh.c | 24 +++++++++++++++++++++--- > 5 files changed, 94 insertions(+), 17 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 7eb8d21bcab9..0483c438de7d 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1691,14 +1691,13 @@ static int exec_binprm(struct linux_binprm *bprm) > /* > * sys_execve() executes a new program. > */ > -static int do_execveat_common(int fd, struct filename *filename, > - struct user_arg_ptr argv, > - struct user_arg_ptr envp, > - int flags) > +static int __do_execve_file(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags, struct file *file) > { > char *pathbuf = NULL; > struct linux_binprm *bprm; > - struct file *file; > struct files_struct *displaced; > int retval; > > @@ -1737,7 +1736,8 @@ static int do_execveat_common(int fd, struct filename *filename, > check_unsafe_exec(bprm); > current->in_execve = 1; > > - file = do_open_execat(fd, filename, flags); > + if (!file) > + file = do_open_execat(fd, filename, flags); > retval = PTR_ERR(file); > if (IS_ERR(file)) > goto out_unmark; > @@ -1745,7 +1745,9 @@ static int do_execveat_common(int fd, struct filename *filename, > sched_exec(); > > bprm->file = file; > - if (fd == AT_FDCWD || filename->name[0] == '/') { > + if (!filename) { > + bprm->filename = "/dev/null"; > + } else if (fd == AT_FDCWD || filename->name[0] == '/') { > bprm->filename = filename->name; > } else { > if (filename->name[0] == '\0') > @@ -1811,7 +1813,8 @@ static int do_execveat_common(int fd, struct filename *filename, > task_numa_free(current); > free_bprm(bprm); > kfree(pathbuf); > - putname(filename); > + if (filename) > + putname(filename); > if (displaced) > put_files_struct(displaced); > return retval; > @@ -1834,10 +1837,29 @@ static int do_execveat_common(int fd, struct filename *filename, > if (displaced) > reset_files_struct(displaced); > out_ret: > - putname(filename); > + if (filename) > + putname(filename); > return retval; > } > > +static int do_execveat_common(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags) > +{ > + struct file *file = NULL; > + > + return __do_execve_file(fd, filename, argv, envp, flags, file); > +} > + > +int do_execve_file(struct file *file, void *__argv, void *__envp) > +{ > + struct user_arg_ptr argv = { .ptr.native = __argv }; > + struct user_arg_ptr envp = { .ptr.native = __envp }; > + > + return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file); > +} > + > int do_execve(struct filename *filename, > const char __user *const __user *__argv, > const char __user *const __user *__envp) > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index b0abe21d6cc9..c783a7b9f284 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -147,5 +147,6 @@ extern int do_execveat(int, struct filename *, > const char __user * const __user *, > const char __user * const __user *, > int); > +int do_execve_file(struct file *file, void *__argv, void *__envp); > > #endif /* _LINUX_BINFMTS_H */ > diff --git a/include/linux/umh.h b/include/linux/umh.h > index 244aff638220..2b10d5f70bd9 100644 > --- a/include/linux/umh.h > +++ b/include/linux/umh.h > @@ -22,6 +22,7 @@ struct subprocess_info { > const char *path; > char **argv; > char **envp; > + struct file *file; > int wait; > int retval; > int (*init)(struct subprocess_info *info, struct cred *new); > @@ -38,6 +39,8 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp, > int (*init)(struct subprocess_info *info, struct cred *new), > void (*cleanup)(struct subprocess_info *), void *data); > > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file); > + > extern int > call_usermodehelper_exec(struct subprocess_info *info, int wait); > > diff --git a/kernel/module.c b/kernel/module.c > index ad2d420024f6..6cfa35795741 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -325,6 +325,7 @@ struct load_info { > struct { > unsigned int sym, str, mod, vers, info, pcpu; > } index; > + struct file *file; > }; > > /* > @@ -2801,6 +2802,15 @@ static int module_sig_check(struct load_info *info, int flags) > } > #endif /* !CONFIG_MODULE_SIG */ > > +static int run_umh(struct file *file) > +{ > + struct subprocess_info *sub_info = call_usermodehelper_setup_file(file); > + > + if (!sub_info) > + return -ENOMEM; > + return call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); > +} run_umh() calls the program and waits. Note that while we are running a UMH we can't suspend. What if they take forever, who is hosing them down with an equivalent: schedule(); try_to_freeze(); Say they are buggy and never return, will they stall suspend, have you tried? call_usermodehelper_exec() uses helper_lock() which in turn is used for umh's accounting for number of running umh's. One of the sad obscure uses for this is to deal with suspend/resume. Refer to __usermodehelper* calls on kernel/power/process.c Note how you use UMH_WAIT_EXEC too, so this is all synchronous. How many of these calls do you expect to be flying through? Is there any chance a system could want to keep spawning a lot of these without any chance of them yielding a bit? I'm not a fan of this use of the UMH lock thing, but the reason for it was back in the day today's firmware fallback mechanism used to be the default firmware loading interface, and it relied on a sysfs loading interface, and *with* this UMH locking thing we prevented system stalls on suspend/resume by first checking if they were supposed to run or not. But *only* the firmware loader interface uses the UMH lock API calls: o usermodehelper_read_lock_wait() o usermodehelper_read_trylock() A long term goal which I had recently was to actually get rid of these stupid calls at least out of the way from the direct firmware lookup calls since no UMH was involved and fortunately we fast-paced that that upstream [0], refer to "devil is in the details". [0] https://patchwork.kernel.org/patch/9949775/ But -- other than not stalling suspend, an other implicit reason for it, was in turn the assumption that the files would be available whenever the callers (in this case the firmware API) needed them, and then there could be a race with the UMH callers and any respective subsystem filesystem which provides the files going down. This is a *theoretical* consideration any other UMH caller needs to make today. This later race is addressed today with the firmware cache implementation, for direct firmware filesystem lookups. And yes, if you are trying to poke around files from the filesystem and are suspending/resuming you may just not get access to some of them today, I just dealt with one case recently reported [1]. So the firmware cache is *still* required today. Seeing any new broad user of of the UMH gives me concern if the above lessons are not taken into consideration. Do we *need* these files to be sure to be present during suspend/resume? How are we sure we won't race with suspend/resume? What if the same file is loaded multiple times, why not have one shared file pointer, and have all users use that while such requests are in place? The firmware loader has this in place with "batched requests". [1] https://lkml.kernel.org/r/20180227232101.20786-1-mcgrof@xxxxxxxxxx If it sounds like the direct firmware loading interface is more robust, and reliable, and tested to load files its because that is exactly what it was designed to do. The kernel UMH is a simple interface with limited use and I'd caution further *extensive* uses of it. > + > /* Sanity checks against invalid binaries, wrong arch, weird elf version. */ > static int elf_header_check(struct load_info *info) > { > @@ -2808,7 +2818,6 @@ static int elf_header_check(struct load_info *info) > return -ENOEXEC; > > if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0 > - || info->hdr->e_type != ET_REL > || !elf_check_arch(info->hdr) > || info->hdr->e_shentsize != sizeof(Elf_Shdr)) > return -ENOEXEC; > @@ -2818,6 +2827,11 @@ static int elf_header_check(struct load_info *info) > info->len - info->hdr->e_shoff)) > return -ENOEXEC; > > + if (info->hdr->e_type == ET_EXEC) > + return run_umh(info->file); Note you run_umh() on elf_header_check(). > + > + if (info->hdr->e_type != ET_REL) > + return -ENOEXEC; > return 0; > } > > @@ -3669,6 +3683,17 @@ static int load_module(struct load_info *info, const char __user *uargs, The missing context line below is: err = elf_header_check(info); > if (err) > goto free_copy; > > + if (info->hdr->e_type == ET_EXEC) { > +#ifdef CONFIG_MODULE_SIG > + if (!info->sig_ok) { > + pr_notice_once("umh %s verification failed: signature and/or required key missing - tainting kernel\n", > + info->file->f_path.dentry->d_name.name); > + add_taint(TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK); > + } > +#endif So I guess this check is done *after* run_umh() then, what about the enforce mode, don't we want to reject loading at all in any circumstance? > + return 0; > + } > + > /* Figure out module layout, and allocate all the memory. */ > mod = layout_and_allocate(info, flags); > if (IS_ERR(mod)) { > @@ -3856,6 +3881,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod, > SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) > { > struct load_info info = { }; > + struct fd f; > loff_t size; > void *hdr; > int err; > @@ -3870,14 +3896,21 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) > |MODULE_INIT_IGNORE_VERMAGIC)) > return -EINVAL; > > - err = kernel_read_file_from_fd(fd, &hdr, &size, INT_MAX, > - READING_MODULE); > + f = fdget(fd); > + if (!f.file) > + return -EBADF; > + > + err = kernel_read_file(f.file, &hdr, &size, INT_MAX, READING_MODULE); > if (err) I wonder if from an LSM perspective security_kernel_read_file() and READING_MODULE will suffice or if we want to have a different identifier here. Luis > - return err; > + goto out; > info.hdr = hdr; > info.len = size; > + info.file = f.file; > > - return load_module(&info, uargs, flags); > + err = load_module(&info, uargs, flags); > +out: > + fdput(f); > + return err; > } > > static inline int within(unsigned long addr, void *start, unsigned long size) > diff --git a/kernel/umh.c b/kernel/umh.c > index 18e5fa4b0e71..4361c694bdb1 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -97,9 +97,13 @@ static int call_usermodehelper_exec_async(void *data) > > commit_creds(new); > > - retval = do_execve(getname_kernel(sub_info->path), > - (const char __user *const __user *)sub_info->argv, > - (const char __user *const __user *)sub_info->envp); > + if (sub_info->file) > + retval = do_execve_file(sub_info->file, > + sub_info->argv, sub_info->envp); > + else > + retval = do_execve(getname_kernel(sub_info->path), > + (const char __user *const __user *)sub_info->argv, > + (const char __user *const __user *)sub_info->envp); > out: > sub_info->retval = retval; > /* > @@ -393,6 +397,20 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > } > EXPORT_SYMBOL(call_usermodehelper_setup); > > +struct subprocess_info *call_usermodehelper_setup_file(struct file *file) > +{ > + struct subprocess_info *sub_info; > + > + sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL); > + if (!sub_info) > + return NULL; > + > + INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); > + sub_info->path = "/dev/null"; > + sub_info->file = file; > + return sub_info; > +} > + > /** > * call_usermodehelper_exec - start a usermode application > * @sub_info: information about the subprocessa > -- > 2.9.5 > > -- Luis Rodriguez, SUSE LINUX GmbH Maxfeldstrasse 5; D-90409 Nuernberg -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html