On Fri, Oct 30, 2020 at 10:07:48AM -0500, Seth Forshee wrote: > On Thu, Oct 29, 2020 at 11:37:23AM -0500, Eric W. Biederman wrote: > > First and foremost: A uid shift on write to a filesystem is a security > > bug waiting to happen. This is especially in the context of facilities > > like iouring, that play very agressive games with how process context > > makes it to system calls. > > > > The only reason containers were not immediately exploitable when iouring > > was introduced is because the mechanisms are built so that even if > > something escapes containment the security properties still apply. > > Changes to the uid when writing to the filesystem does not have that > > property. The tiniest slip in containment will be a security issue. > > > > This is not even the least bit theoretical. I have seem reports of how > > shitfs+overlayfs created a situation where anyone could read > > /etc/shadow. > > This bug was the result of a complex interaction with several > contributing factors. It's fair to say that one component was overlayfs > writing through an id-shifted mount, but the primary cause was related > to how copy-up was done coupled with allowing unprivileged overlayfs > mounts in a user ns. Checks that the mounter had access to the lower fs > file were not done before copying data up, and so the file was copied up > temporarily to the id shifted upperdir. Even though it was immediately > removed, other factors made it possible for the user to get the file > contents from the upperdir. > > Regardless, I do think you raise a good point. We need to be wary of any > place the kernel could open files through a shifted mount, especially > when the open could be influenced by userspace. > > Perhaps kernel file opens through shifted mounts should to be opt-in. > I.e. unless a flag is passed, or a different open interface used, the > open will fail if the dentry being opened is subject to id shifting. > This way any kernel writes which would be subject to id shifting will > only happen through code which as been written to take it into account. For my use cases, it would be fine to require opt-in at original fs mount time by init_user_ns admin. I.e. mount -o allow_idmap /dev/mapper/whoozit /whatzit I'm quite certain I would always be sharing a separate LV or loopback or tmpfs. -serge