device namespaces

"Enrico Weigelt, metux IT consult" <lkml@xxxxxxxxx> · Tue, 8 Jun 2021 11:38:16 +0200

Hello folks,

I'm going to implement device namespaces, where containers can get an
entirely different view of the devices in the machine (usually just a
specific subset, but possibly additional virtual devices).

For start I'd like to add a simple mapping of dev maj/min (leaving aside
sysfs, udev, etc). An important requirement for me is that the parent ns
can choose to delegate devices from those it full access too (child
namespaces can do the same to their childs), and the assignment can
change (for simplicity ignoring the case of removing devices that are
already opened by some process - haven't decided yet whether they should
be forcefully closed or whether keeping them open is a valid use case).

The big question for me now is how exactly to do the table maintenance
from userland. We already have entries in /proc/<pid>/ns/*. I'm thinking
about using them as command channel, like this:

* new child namespaces are created with empty mapping
* mapping manipulation is done by just writing commands to the ns file
* access is only granted if the writing process itself is in the
 parent's device ns and has CAP_SYS_ADMIN (or maybe their could be some
 admin user for the ns ? or the 'root' of the corresponding user_ns ?)
* if the caller has some restrictions on some particular device, these
 are automatically added (eg. if you're restricted to readonly, you
 can't give rw to the child ns).

Is this a good way to go ? Or what would be a better one ?

--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@xxxxxxxxx -- +49-151-27565287