Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > On Sat, Jun 06, 2020 at 03:33:14PM -0700, Linus Torvalds wrote: >> On Sat, Jun 6, 2020 at 1:20 PM Alexei Starovoitov >> <alexei.starovoitov@xxxxxxxxx> wrote: >> > >> > Please mention specific bugs and let's fix them. >> >> Well, Eric did mention one explicit bug, and several "looks dodgy" bugs. >> >> And the fact is, this isn't used. >> >> It's clever, and I like the concept, but it was probably a mistake to >> do this as a user-mode-helper thing. >> >> If people really convert netfilter rules to bpf, they'll likely do so >> in user space. This bpfilter thing hasn't gone anywhere, and it _has_ >> caused problems. >> >> So Alexei, I think the burden of proof is not on Eric, but on you. >> >> Eric's claim is that >> >> (a) it has bugs (and yes, he pointed to at lelast one) > > the patch from March 12 ? > I thought it landed long ago. Is there an issue with it? > 'handling is questionable' is not very constructive. It was half a fix. Tetsuo still doesn't know how to fix tomoyo to work with fork_usermode_blob. He was asking for your feedback and you did not give it. The truth is Tetsuo's fix was only a fix for the symptoms. It was not a good fix to the code. >> (b) it's not doing anything useful > > true. > >> (b) it's a maintenance issue for execve, which is what Eric maintains. > > I'm not aware of execve issues. I don't remember being cc-ed on them. > To me this 'lets remove everything' patch comes out of nowhere with > a link to three month old patch as a justification. I needed to know how dead the code is and your reply has confirmed that the code is dead. Deleting the code is much easier than the detailed careful work it would take to make code that is in use work correctly. >> So you can't just dismiss this, ignore the reported bug, and say >> "we'll fix them". >> >> That only answers (a) (well, it _would_ have answered (a)., except you >> actually didn't even read Eric's report of existing bugs). >> >> What is your answer to (b)-(c)? > > So far we had two attempts at converting netfilter rules to bpf. Both ended up > with user space implementation and short cuts. bpf side didn't have loops and > couldn't support 10k+ rules. That is what stalled the effort. imo it's a > pointless corner case, but to be a true replacement people kept bringing it up > as something valid. Now we have bpf iterator concept and soon bpf will be able > to handle millions of rules. Also folks are also realizing that this effort has > to be project managed appropriately. Will it materialize in patches tomorrow? > Unlikely. Probably another 6 month at least. Also outside of netfilter > conversion we've started /proc extension effort that will use the same umh > facility. It won't be ready tomorrow as well, but both need umh. Given that I am one of the folks who looks after proc I haven't seen that either. The direction I have seen in the last 20 years is people figuring out how to reduce proc not really how to extend it so I can't imagine what a /proc extension effort is. > initrd is not > an option due to operational constraints. We need a way to ship kernel tarball > where bpf things are ready at boot. I suspect /proc extensions patches will > land sooner. Couple month ago people used umh to do ovs->xdp translatation. It > didn't land. People argued that the same thing can be achieved in user space > and they were correct. So you're right that for most folks user space is the > answer. But there are cases where kernel has to have these things before > systemd starts. You may have a valid case for doing things in the kernel before systemd starts. The current mechanism is fundamentally in conflict with the LSMs which is an unresolved problem. I don't see why you can't have a userspace process that does: pid = fork(); if (pid == 0) { /* Do bpf stuff */ } else if (pid > 0) { execve("/sbin/init", ...); } You can build an initramfs with that code right into the kernel, so I can't imagine the existing mechanisms being insufficient. That said the fork_usermode_blob code needs to be taken out and rewritten so as not to impose a burden on the rest of the code. There is no reason why code that is called only one time can not allocate a filename and pass it to __do_execve_file. There is no reason to allow modules access to any of that functionality if you need something before an initramfs can be processed. exit_umh() is completely unnecessary all that is needed is a reference to a struct pid. There are all of these layers and abstractions but with only the single user in net/bpfilter/bpfilter_kern.c they all appear to have been jumbled together without good layering inbetween then. That is just what I see from looking at the code quickly. All of those problems need to be addressed before fork_usermode_blob grows any real users. As for other users that want to use for_usermode_blob in the future currently the code in the kernel is not at all straightforward to tell if it is correct or not. So before it grows any living users the code need to be rewitten so that it is easy to tell that it is correct. I have sympathy with your efforts but since the code is currently dead, and in need of work. I will write a good version of removing CONFIG_BPFILTER_UMH on top of -rc1, leaving CONFIG_BPFILTER. That should give you a solid foundation to build upon, while making the kernel maintainble. Eric