On Wed, Sep 22, 2021, at 8:52 AM, Christian Brauner wrote: > On Wed, Sep 22, 2021 at 08:34:23AM -0700, Andy Lutomirski wrote: >> On Wed, Sep 22, 2021, at 5:25 AM, Christian Brauner wrote: >> > On Mon, Sep 20, 2021 at 11:36:47AM -0700, Andy Lutomirski wrote: >> >> On Mon, Sep 20, 2021 at 11:16 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: >> >> > >> >> > On Mon, Sep 20, 2021 at 04:51:19PM +0200, Thomas Weißschuh wrote: >> >> >> >> > > > Do you mean it literally invokes /sbin/modprobe? If so, hooking this >> >> > > > at /sbin/modprobe and calling out to the container manager seems like >> >> > > > a decent solution. >> >> > > >> >> > > Yes it does. Thanks for the idea, I'll see how this works out. >> >> > >> >> > Would documentation guiding you in that way have helped? If so >> >> > I welcome a patch that does just that. >> >> >> >> If someone wants to make this classy, we should probably have the >> >> container counterpart of a standardized paravirt interface. There >> >> should be a way for a container to, in a runtime-agnostic way, issue >> >> requests to its manager, and requesting a module by (name, Linux >> >> kernel version for which that name makes sense) seems like an >> >> excellent use of such an interface. >> > >> > I always thought of this in two ways we currently do this: >> > >> > 1. Caller transparent container manager requests. >> > This is the seccomp notifier where we transparently handle syscalls >> > including intercepting init_module() where we parse out the module to >> > be loaded from the syscall args of the container and if it is >> > allow-listed load it for the container otherwise continue the syscall >> > letting it fail or failing directly through seccomp return value. >> >> Specific problems here include aliases and dependencies. My modules.alias file, for example, has: >> >> alias net-pf-16-proto-16-family-wireguard wireguard >> >> If I do modprobe net-pf-16-proto-16-family-wireguard, modprobe parses some files in /lib/modules/`uname -r` and issues init_module() asking for 'wireguard'. So hooking init_module() is at the wrong layer -- for that to work, the container's /sbin/modprobe needs to already have figured out that the desired module is wireguard and have a .ko for it. > > You can't use the container's .ko module. For this you would need to > trust the image that the container wants you to load. The container > manager should always load a host module. > Agreed. >> >> > >> > 2. A process in the container explicitly calling out to the container >> > manager. >> > One example how this happens is systemd-nspawn via dbus messages >> > between systemd in the container and systemd outside the container to >> > e.g. allocate a new terminal in the container (kinda insecure but >> > that's another issue) or other stuff. >> > >> > So what was your idea: would it be like a device file that could be >> > exposed to the container where it writes requestes to the container >> > manager? What would be the advantage to just standardizing a socket >> > protocol which is what we do for example (it doesn't do module loading >> > of course as we handle that differently): >> >> My idea is standardizing *something*. I think it would be nice if, for example, distros could ship a /sbin/modprobe that would do the right thing inside any compliant container runtime as well as when running outside a container. >> >> I suppose container managers could also bind-mount over /sbin/modprobe, but that's more intrusive. > > I don't see this is a big issue because that is fairly trivial. > I think we never want to trust the container's modules. > What probably should be happening is that the manager exposes a list of > modules the container can request in some form. We have precedence for > doing something like this. > So now modprobe and similar tools can be made aware that if they are in > a container they should request that module from the container manager > be it via a socket request or something else. > Nesting will be a bit funny but can probably be made to work by just > bind-mounting the outermost socket into the container or relaying the > request. Why bother with a list? I think it should be sufficient for the container to ask for a module and either get it or not get it.