On Wed, Jan 19, 2022 at 06:15:22PM +0100, Alexey Gladkov wrote: > On Wed, Jan 19, 2022 at 05:24:23PM +0100, Christian Brauner wrote: > > On Wed, Jan 19, 2022 at 06:48:03PM +0300, Alexey Dobriyan wrote: > > > From 61376c85daab50afb343ce50b5a97e562bc1c8d3 Mon Sep 17 00:00:00 2001 > > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx> > > > Date: Mon, 22 Nov 2021 20:41:06 +0300 > > > Subject: [PATCH 1/1] proc: "mount -o lookup=..." support > > > > > > Docker implements MaskedPaths configuration option > > > > > > https://github.com/estesp/docker/blob/9c15e82f19b0ad3c5fe8617a8ec2dddc6639f40a/oci/defaults.go#L97 > > > > > > to disable certain /proc files. It overmounts them with /dev/null. > > > > > > Implement proper mount option which selectively disables lookup/readdir > > > in the top level /proc directory so that MaskedPaths doesn't need > > > to be updated as time goes on. > > > > I might've missed this when this was sent the last time so maybe it was > > clearly explained in an earlier thread: What's the reason this needs to > > live in the kernel? > > > > The MaskedPaths entry is optional so runtimes aren't required to block > > anything by default and this mostly makes sense for workloads that run > > privileged. > > > > In addition MaskedPaths is a generic option which allows to hide any > > existing path, not just proc. Even in the very docker-specific defaults > > /sys/firmware is covered. > > > > I do see clear value in the subset= and hidepid= options. They are > > generally useful independent of opinionated container workloads. I don't > > see the same for lookup=. > > > > An alternative I find more sensible is to add a new value for subset= > > that hides anything(?) that only global root should have read/write > > access too. > > Or we can allow to change permissions in the procfs only in the direction > of decreasing (if some file has 644 then allow to set 640 or 600). In this > case, we will not need to constantly check the whitelist. I don't fancy any filtering or allowlist approach. I find that rather inelegant. But if I understand you correctly is that if we were to have decreasing permissions we could allow a (namespace) procfs-admin to set permissions so that the relevant files are essentially read-only or not even readable at all for container workloads. So once you've lowered perms you can't raise them which ensures even namespace procfs-admin can't raise them again. Might work as well. But that implies that we wouldn't need any allowlist at all afaict.