Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Tue, Jun 09, 2020 at 03:02:30PM -0500, Eric W. Biederman wrote: >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: >> >> > bpf_lsm is that thing that needs to load and start acting early. >> > It's somewhat chicken and egg. fork_usermode_blob() will start a process >> > that will load and apply security policy to all further forks and >> > execs. >> >> What is the timeframe for bpf_lsm patches wanting to use >> fork_usermode_blob()? >> >> Are we possibly looking at something that will be ready for the next >> merge window? > > In bpf space there are these that want to use usermode_blobs: > 1. bpfilter itself. > First of all I think we made a mistake delaying landing the main patches: > https://lore.kernel.org/patchwork/patch/902785/ > https://lore.kernel.org/patchwork/patch/902783/ > without them bpfilter is indeed dead. That probably was the reason > no one was brave enough to continue working on it. > So I think the landed skeleton of bpfilter can be removed. > I think no user space code will notice that include/uapi/linux/bpfilter.h > is gone. So it won't be considered as user space breakage. > Similarly CONFIG_BPFILTER can be nuked too. > bpftool is checking for it (see tools/bpf/bpftool/feature.c) > but it's fine to remove it. > I still think that the approach taken was a correct one, but > lifting that project off the ground was too much for three of us. > So when it's staffed appropriately we can re-add that code. > > 2. bpf_lsm. > It's very active at the moment. I'm working on it as well > (sleepable progs is targeting that), but I'm not sure when folks > would have to have it during the boot. So far it sounds that > they're addressing more critical needs first. "bpf_lsm ready at boot" > came up several times during "bpf office hours" conference calls, > so it's certainly on the radar. If I to guess I don't think > bpf_lsm will use usermode_blobs in the next 6 weeks. > More likely 2-4 month. > > 3. bpf iterator. > It's already capable extension of several things in /proc. > See https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@xxxxxx/ > Cat-ing bpf program as "cat /sys/fs/bpf/my_ipv6_route" > will produce the same human output as "cat /proc/net/ipv6_route". > The key difference is that bpf is all tracing based and it's unstable. > struct fib6_info can change and prog will stop loading. > There are few FIXME in there. That is being addressed right now. > After that the next step is to make cat-able progs available > right after boot via usermode_blobs. > Unlike cases 1 and 2 here we don't care that they appear before pid 1. > They can certainly be chef installed and started as services. > But they are kernel dependent, so deploying them to production > is much more complicated when they're done as separate rpm. > Testing is harder and so on. Operational issues pile up when something > that almost like kernel module is done as a separate package. > Hence usermode_blob fits the best. > Of course we were not planning to add a bunch of them to kernel tree. > The idea was to add only _one_ such cat-able bpf prog and have it as > a selftest for usermode_blob + bpf_iter. What we want our users to > see in 'cat my_ipv6_route' is probably different from other companies. > These patches will likely be using usermode_blob() in the next month. > > But we don't need to wait. We can make the progress right now. > How about we remove bpfilter uapi and rename net/bpfilter/bpfilter_kern.c > into net/umb/umb_test.c only to exercise Makefile to build elf file > from simple main.c including .S with incbin trick > and kernel side that does fork_usermode_blob(). > And that's it. > net/ipv4/bpfilter/sockopt.c and kconfig can be removed. > That would be enough base to do use cases 2 and 3 above. > Having such selftest will be enough to adjust the layering > for fork_usermode_blob(), right? If I understand correctly you are asking people to support out of tree code. I see some justification for this functionality for in-tree code. For out of tree code there really is no way to understand support or maintain the code. We probably also need to have a conversation about why this functionality is a better choice that using a compiled in initramfs, such as can be had by setting CONFIG_INITRAMFS_SOURCE. Even with this write up and the conversations so far I don't understand what problem fork_usermode_blob is supposed to be solving. Is there anything kernel version dependent about bpf_lsm? For me the primary justification of something like fork_usermode_blob is something that is for all practical purposes a kernel module but it just happens to run in usermode. >From what little I know about bpf_lsm that isn't the case. So far all you have mentioned is that bpf_lsm needs to load early. That seems like something that could be solved by a couple of lines init/main.c that forks and exec's a program before init if it is present. Maybe that also needs a bit of protection so the bootloader can't override the binary. The entire concept of a loadable lsm has me scratching my head. Last time that concept was seriously looked at the races for initializing per object data were difficult enough to deal with modular support was removed from all of the existing lsms. Not to mention there are places where the lsm hooks are a pretty lousy API and will be refactored to make things better with no thought of any out of tree code. > If I understood you correctly you want to replace pid_t > in 'struct umh_info' with proper 'struct pid' pointer that > is refcounted, so user process's exit is clean? What else? No "if (filename)" or "if (file)" on the exec code paths. No extra case for the LSM's to have to deal with. Nothing fork_usermode_blob does is something that can't be done from userspace as far as execve is concerned so there is no justification for any special cases in the core of the exec code. Getting the deny_write_count and the reference count correct on the file argument as well as getting BPRM_FLAGS_PATH_INACCESSIBLE set. Using the proper type for argv and envp. Those are the things I know of that need to be addressed. Getting the code refactored so that the do_open_execat can be called in do_execveat_common instead of __do_execve_file is enough of a challenge of code motion I really would rather not do that. Unfortunately that is the only way I can see right now to have both do_execveat_common and do_execve_file pass in a struct file. Calling deny_write_access and get_file in do_execve_file and probably a bit more is the only way I can see to cleanly isoloate the special cases fork_usermode_blob brings to the table. Strictly speaking I am also aware of the issue that the kernel has to use set_fs(KERNEL_DS) to allow argv and envp to exist in kernel space instead of userspace. That needs to be fixed as well, but for all kernel uses of exec. So any work fixing fork_usermode_blob can ignore that issue. Eric