On Thu, Jan 26, 2023 at 11:16:07AM +0100, Christian Brauner wrote: > On Wed, Jan 25, 2023 at 03:36:28PM -0800, Andrew Morton wrote: > > On Wed, 25 Jan 2023 16:28:47 +0100 Alexey Gladkov <legion@xxxxxxxxxx> wrote: > > > > > The patch expands subset= option. If the proc is mounted with the > > > subset=allowlist option, the /proc/allowlist file will appear. This file > > > contains the filenames and directories that are allowed for this > > > mountpoint. By default, /proc/allowlist contains only its own name. > > > Changing the allowlist is possible as long as it is present in the > > > allowlist itself. > > > > > > This allowlist is applied in lookup/readdir so files that will create > > > modules after mounting will not be visible. > > > > > > Compared to the previous patches [1][2], I switched to a special virtual > > > file from listing filenames in the mount options. > > > > > > > Changlog doesn't explain why you think Linux needs this feature. The > > [2/6] changelog hints that containers might be involved. IOW, please > > fully describe the requirement and use-case(s). > > > > Also, please describe why /proc/allowlist is made available via a mount > > option, rather than being permanently present. > > > > And why add to subset=, instead of a separate mount option. > > > > Does /proc/allowlist work in subdirectories? Like, permit presence of > > /proc/sys/vm/compact_memory? > > > > I think the whole thing is misnamed, really. "allowlist" implies > > access permissions. Some of the test here uses "visibility" and other > > places use "presence", which are better. "presentlist" and > > /proc/presentlist might be better. But why not simply /proc/contents? > > Currently, a lot of container runtimes - even if they mount a new procfs > instance - overmount various procfs files and directories to ensure that > they're hidden from the container workload. (The motivations for this > are mixed and usually it's only needed for containers that run with the > same privilege level as the host.) > > The consequence of overmounting is that we need to refuse mounting > procfs again somewhere else otherwise the procfs instance might reveal > files and directories that were supposed to be hidden. > > So this patchset moves the ability to hide entries into the kernel > through an allowlist. This way you can hide files and directories while > being able to mount procfs again because it will inherit the same > allowlist. > > I get the motivation. The question is whether this belongs into the > kernel at all. I'm unfortunately not convinced. > > This adds a lot of string parsing to procfs and I think we would also > need to decide what a reasonable maximum limit for such allowlists would > be.> The data structure likely shouldn't be a linked list but at least an > rbtree especially if the size isn't limited. There is a limit. So far I've limited the file size to 128k. I think this is a reasonable limit. > But fundamentally I think it moves something that should be and > currently is a userspace policy into the kernel which I think is wrong. We don't have mechanisms to implement this userspace policy. overmount is not a solution but plugging holes in the absence of other ways to control the visibility of files in procfs. > Sure you can't predict what files show up in procfs over time but then > subset=pid is already your friend - even if not as fine-grained. > > If this where another simple subset style mount option that allowlists a > bunch of well-known global proc files then sure. But making this > dynamically configurable from userspace doesn't make sense to me. I > mean, users could write /gobble/dy/gook into /proc/allowlist or use it > to stash secrets or hashes or whatever as we have no way of figuring out > whether the entry they allowlist does or will actually ever exist. BTW I only allow printable data to be written to the file. We can make this file write-only and then writing any extraneous data there will not make sense. > In general, such flexibility belongs into userspace imho. > > Frankly, if that is really required it would almost make more sense to > be able to attach a new bpf program type to procfs that would allow to > filter procfs entries. Then the filter could be done purely in > userspace. If signed bpf lands one could then even ship signed programs > that are attachable by userns root. I'll ask the podman developers how much more comfortable they would be using bpf to control file visibility in procfs. thanks for the idea. -- Rgrds, legion