On 2020-07-02, Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > On Thu, Jul 02, 2020 at 08:17:38AM -0500, Eric W. Biederman wrote: > > Matt Bennett <matt.bennett@xxxxxxxxxxxxxxxxxxx> writes: > > > > > Previously the connector functionality could only be used by processes running in the > > > default network namespace. This meant that any process that uses the connector functionality > > > could not operate correctly when run inside a container. This is a draft patch series that > > > attempts to now allow this functionality outside of the default network namespace. > > > > > > I see this has been discussed previously [1], but am not sure how my changes relate to all > > > of the topics discussed there and/or if there are any unintended side effects from my draft > > > changes. > > > > Is there a piece of software that uses connector that you want to get > > working in containers? > > > > I am curious what the motivation is because up until now there has been > > nothing very interesting using this functionality. So it hasn't been > > worth anyone's time to make the necessary changes to the code. > > Imho, we should just state once and for all that the proc connector will > not be namespaced. This is such a corner-case thing and has been > non-namespaced for such a long time without consistent push for it to be > namespaced combined with the fact that this needs quite some code to > make it work correctly that I fear we end up buying more bugs than we're > selling features. And realistically, you and I will end up maintaining > this and I feel this is not worth the time(?). Maybe I'm being too > pessimistic though. It would be nice to have the proc connector be namespaced, because it would allow you to have init systems that don't depend on cgroups to operate -- and it would allow us to have a subset of FreeBSD's kqueue functionality that doesn't exist today under Linux. However, arguably pidfds might be a better path forward toward implementing such events these days -- and is maybe something we should look into. All of that being said, I agree that doing this is going to be particularly hairy and likely not worth the effort. In particular, the proc connector is: * Almost entirely unused (and largely unknown) by userspace. * Fairly fundamentally broken right now (the "security feature" of PROC_CN_MCAST_LISTEN doesn't work because once there is one listener, anyone who opens an cn_proc socket can get all events on the system -- and if the process which opened the socket dies with calling PROC_CN_MCAST_IGNORE then that information is now always streaming). So if we end up supporting this, we'll need to fix those bugs too. * Is so deeply intertwined with netlink and thus is so deeply embedded with network namespaces (rather than pid namespaces) meaning that getting it to correctly handle shared-network-namespace cases is going to be a nightmare. I agree with Eric that this patchset looks like it doesn't approach the problem from the right angle (and thinking about how you could fix it makes me a little nervous). Not to mention that when I brought this up with the maintainer listed in MAINTAINERS a few years ago (soon after I posted [1]), I was told that they no longer maintain this code -- so whoever touches it next is the new maintainer. In 2017, I wrote that GNU Shepherd uses cn_proc, however I'm pretty sure (looking at the code now) that it wasn't true then and isn't true now (Shepherd seems to just do basic pidfile liveliness checks). So even the niche example I used then doesn't actually use cn_proc. [1]: https://lore.kernel.org/lkml/a2fa1602-2280-c5e8-cac9-b718eaea5d22@xxxxxxx/ -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers