On Wed, Jan 19, 2022 at 06:31:07PM +0100, Christian Brauner wrote: > On Wed, Jan 19, 2022 at 06:15:22PM +0100, Alexey Gladkov wrote: > > On Wed, Jan 19, 2022 at 05:24:23PM +0100, Christian Brauner wrote: > > > On Wed, Jan 19, 2022 at 06:48:03PM +0300, Alexey Dobriyan wrote: > > > > From 61376c85daab50afb343ce50b5a97e562bc1c8d3 Mon Sep 17 00:00:00 2001 > > > > From: Alexey Dobriyan <adobriyan@xxxxxxxxx> > > > > Date: Mon, 22 Nov 2021 20:41:06 +0300 > > > > Subject: [PATCH 1/1] proc: "mount -o lookup=..." support > > > > > > > > Docker implements MaskedPaths configuration option > > > > > > > > https://github.com/estesp/docker/blob/9c15e82f19b0ad3c5fe8617a8ec2dddc6639f40a/oci/defaults.go#L97 > > > > > > > > to disable certain /proc files. It overmounts them with /dev/null. > > > > > > > > Implement proper mount option which selectively disables lookup/readdir > > > > in the top level /proc directory so that MaskedPaths doesn't need > > > > to be updated as time goes on. > > > > > > I might've missed this when this was sent the last time so maybe it was > > > clearly explained in an earlier thread: What's the reason this needs to > > > live in the kernel? > > > > > > The MaskedPaths entry is optional so runtimes aren't required to block > > > anything by default and this mostly makes sense for workloads that run > > > privileged. > > > > > > In addition MaskedPaths is a generic option which allows to hide any > > > existing path, not just proc. Even in the very docker-specific defaults > > > /sys/firmware is covered. > > > > > > I do see clear value in the subset= and hidepid= options. They are > > > generally useful independent of opinionated container workloads. I don't > > > see the same for lookup=. > > > > > > An alternative I find more sensible is to add a new value for subset= > > > that hides anything(?) that only global root should have read/write > > > access too. > > > > Or we can allow to change permissions in the procfs only in the direction > > of decreasing (if some file has 644 then allow to set 640 or 600). In this > > case, we will not need to constantly check the whitelist. > > I don't fancy any filtering or allowlist approach. I find that rather > inelegant. Yep. I also don't find it very convenient if you need to allow more than one or two files. That's why I didn't do anything like that when I implemented subset=. > But if I understand you correctly is that if we were to have > decreasing permissions we could allow a (namespace) procfs-admin to set > permissions so that the relevant files are essentially read-only or not > even readable at all for container workloads. So once you've lowered > perms you can't raise them which ensures even namespace procfs-admin > can't raise them again. Yes. This is what I meant. > Might work as well. But that implies that we wouldn't need any allowlist > at all afaict. Yes, in this case we don't need a list. -- Rgrds, legion