Re: Isolating abstract sockets

Boris Lukashev <blukashev@xxxxxxxxxxxxxxxx> · Tue, 24 Oct 2023 10:05:29 -0400

Namespacing at OSI4 seems a bit fraught as the underlying route, mac, and physdev fall outside the callers control. Multiple NS' sharing an IP stack would exhaust ephemeral ranges faster (likely asymmetrically too) and have bound socket collisions opaque to each other requiring handling outside the NS/containers purview. We looked at this sort of thing during the r&d phase of our assured comms work (namespaces were young) and found a bunch of overhead and collision concerns. Not saying it can't be done, but getting consumers to play nice enough with such an approach may be a heavy lift.

Thanks,
-Boris

On October 24, 2023 9:46:08 AM EDT, "Serge E. Hallyn" <serge@xxxxxxxxxx> wrote:

On Sun, Dec 18, 2022 at 08:29:10PM +0100, Stefan Bavendiek wrote:
When building userspace application sandboxes, one issue that does not seem trivial to solve is the isolation of abstract sockets.

Veeery late reply.  Have you had any productive discussions about this in
other threads or venues?

While most IPC mechanism can be isolated by mechanisms like mount namespaces, abstract sockets are part of the network namespace.
It is possible to isolate abstract sockets by using a new network namespace, however, unprivileged processes can only create a new empty network namespace, which removes network access as well and makes this useless for network clients.

Same linux sandbox projects try to solve this by bridging the existing network interfaces into the new namespace or use something like slirp4netns to archive this, but this does not look like an ideal solution to this problem, especially since sandboxing should reduce the kernel attack surface without introducing more complexity.

Aside from containers using namespaces, sandbox implementations based on seccomp and landlock would also run into the same problem, since landlock only provides file system isolation and seccomp cannot filter the path argument and therefore it can only be used to block new unix domain socket connections completely.

Currently there does not seem to be any way to disable network namespaces in the kernel without also disabling unix domain sockets.

The question is how to solve the issue of abstract socket isolation in a clean and efficient way, possibly even without namespaces.
What would be the ideal way to implement a mechanism to disable abstract sockets either globally or even better, in the context of a process.
And would such a patch have a realistic chance to make it into the kernel?

Disabling them altogether would break lots of things depending on them,
like X :)  (@/tmp/.X11-unix/X0).  The other path is to reconsider network
namespaces.  There are several directions this could lead.  For one, as
Dinesh Subhraveti often points out, the current "network" namespace is
really a network device namespace.  If we instead namespace at the
bind/connect/etc calls, we end up with much different abilities.  You
can implement something like this today using seccomp-filter.

-serge