On Wed, Jun 10, 2020 at 04:12:29PM -0500, Eric W. Biederman wrote: > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > > > On Tue, Jun 09, 2020 at 03:02:30PM -0500, Eric W. Biederman wrote: > >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > >> > >> > bpf_lsm is that thing that needs to load and start acting early. > >> > It's somewhat chicken and egg. fork_usermode_blob() will start a process > >> > that will load and apply security policy to all further forks and > >> > execs. > >> > >> What is the timeframe for bpf_lsm patches wanting to use > >> fork_usermode_blob()? > >> > >> Are we possibly looking at something that will be ready for the next > >> merge window? > > > > In bpf space there are these that want to use usermode_blobs: > > 1. bpfilter itself. > > First of all I think we made a mistake delaying landing the main patches: > > https://lore.kernel.org/patchwork/patch/902785/ > > https://lore.kernel.org/patchwork/patch/902783/ > > without them bpfilter is indeed dead. That probably was the reason > > no one was brave enough to continue working on it. > > So I think the landed skeleton of bpfilter can be removed. > > I think no user space code will notice that include/uapi/linux/bpfilter.h > > is gone. So it won't be considered as user space breakage. > > Similarly CONFIG_BPFILTER can be nuked too. > > bpftool is checking for it (see tools/bpf/bpftool/feature.c) > > but it's fine to remove it. > > I still think that the approach taken was a correct one, but > > lifting that project off the ground was too much for three of us. > > So when it's staffed appropriately we can re-add that code. > > > > 2. bpf_lsm. > > It's very active at the moment. I'm working on it as well > > (sleepable progs is targeting that), but I'm not sure when folks > > would have to have it during the boot. So far it sounds that > > they're addressing more critical needs first. "bpf_lsm ready at boot" > > came up several times during "bpf office hours" conference calls, > > so it's certainly on the radar. If I to guess I don't think > > bpf_lsm will use usermode_blobs in the next 6 weeks. > > More likely 2-4 month. > > > > 3. bpf iterator. > > It's already capable extension of several things in /proc. > > See https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@xxxxxx/ > > Cat-ing bpf program as "cat /sys/fs/bpf/my_ipv6_route" > > will produce the same human output as "cat /proc/net/ipv6_route". > > The key difference is that bpf is all tracing based and it's unstable. > > struct fib6_info can change and prog will stop loading. > > There are few FIXME in there. That is being addressed right now. > > After that the next step is to make cat-able progs available > > right after boot via usermode_blobs. > > Unlike cases 1 and 2 here we don't care that they appear before pid 1. > > They can certainly be chef installed and started as services. > > But they are kernel dependent, so deploying them to production > > is much more complicated when they're done as separate rpm. > > Testing is harder and so on. Operational issues pile up when something > > that almost like kernel module is done as a separate package. > > Hence usermode_blob fits the best. > > Of course we were not planning to add a bunch of them to kernel tree. > > The idea was to add only _one_ such cat-able bpf prog and have it as > > a selftest for usermode_blob + bpf_iter. What we want our users to > > see in 'cat my_ipv6_route' is probably different from other companies. > > These patches will likely be using usermode_blob() in the next month. > > > > But we don't need to wait. We can make the progress right now. > > How about we remove bpfilter uapi and rename net/bpfilter/bpfilter_kern.c > > into net/umb/umb_test.c only to exercise Makefile to build elf file > > from simple main.c including .S with incbin trick > > and kernel side that does fork_usermode_blob(). > > And that's it. > > net/ipv4/bpfilter/sockopt.c and kconfig can be removed. > > That would be enough base to do use cases 2 and 3 above. > > Having such selftest will be enough to adjust the layering > > for fork_usermode_blob(), right? > > If I understand correctly you are asking people to support out of tree > code. I see some justification for this functionality for in-tree code. > For out of tree code there really is no way to understand support or > maintain the code. It's just like saying that sys_finit_module() is there to support out of tree code. There are in- and out- tree modules and there will be in- and out- of tree bpf programs, but the focus is on those that are relevant for the long term future of the kernel. The 1 case above is in-tree only. There is nothing in bpfilter that makes sense out of tree. The 2 case (bpf_lsm) is primarily in-tree. Security is something everyone wants its own way, but majority of bpf_lsm functionality should live in-tree. The 3 case is mostly out-of-tree. If there was obvious way to extend /proc it could have been in-tree, but no one will agree. > We probably also need to have a conversation about why this > functionality is a better choice that using a compiled in initramfs, > such as can be had by setting CONFIG_INITRAMFS_SOURCE. I explained it several times already. I don't see how initramfs solves 1 and 2. > Even with this write up and the conversations so far I don't understand > what problem fork_usermode_blob is supposed to be solving. Is there > anything kernel version dependent about bpf_lsm? For me the primary > justification of something like fork_usermode_blob is something that is > for all practical purposes a kernel module but it just happens to run in > usermode. that's what it is. It's a kernel module that runs in user space. > From what little I know about bpf_lsm that isn't the case. So far all It is. > you have mentioned is that bpf_lsm needs to load early. That seems like > something that could be solved by a couple of lines init/main.c that > forks and exec's a program before init if it is present. Maybe that > also needs a bit of protection so the bootloader can't override the > binary. > > The entire concept of a loadable lsm has me scratching my head. Last > time that concept was seriously looked at the races for initializing per > object data were difficult enough to deal with modular support was > removed from all of the existing lsms. I'm not sure what races you're talking about. usermode_blob will interact with kernel via syscalls and other standard communication mechanism. > Not to mention there are places where the lsm hooks are a pretty lousy > API and will be refactored to make things better with no thought of any > out of tree code. I don't see how refactoring LSM hooks is relevant in this discussion. > > > If I understood you correctly you want to replace pid_t > > in 'struct umh_info' with proper 'struct pid' pointer that > > is refcounted, so user process's exit is clean? What else? > > No "if (filename)" or "if (file)" on the exec code paths. No extra case > for the LSM's to have to deal with. Nothing fork_usermode_blob does is > something that can't be done from userspace as far as execve is > concerned so there is no justification for any special cases in the core > of the exec code. Adding getname_kernel() instead of filename==NULL is trivial enough and makes sense as a cleanup. But where do you see 'if (file)' ? The correct 'file' pointer is passed from shmem_kernel_file_setup() all the way to exec. > Getting the deny_write_count and the reference count correct on the file > argument as well as getting BPRM_FLAGS_PATH_INACCESSIBLE set. There is no fd because there is no task, but there is a file. I think do_execve should assume BINPRM_FLAGS_PATH_INACCESSIBLE in this case. > Using the proper type for argv and envp. I guess that's going to be a part of other cleanup. > Those are the things I know of that need to be addressed. > > > Getting the code refactored so that the do_open_execat can be called > in do_execveat_common instead of __do_execve_file is enough of a > challenge of code motion I really would rather not do that. Unfortunately that is > the only way I can see right now to have both do_execveat_common and > do_execve_file pass in a struct file. The 'struct file' is there. Please take another look at the code. > Calling deny_write_access and get_file in do_execve_file and probably > a bit more is the only way I can see to cleanly isoloate the special > cases fork_usermode_blob brings to the table. > > > Strictly speaking I am also aware of the issue that the kernel has to > use set_fs(KERNEL_DS) to allow argv and envp to exist in kernel space > instead of userspace. That needs to be fixed as well, but for all > kernel uses of exec. So any work fixing fork_usermode_blob can ignore > that issue. well, this is the problem of usermodehelper_exec. usermode_blob doesn't use argv/envp. They could be NULL for all practical purpose.