On Tue, Aug 15, 2023 at 2:30 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote: > > We support locking certain mount attributes in the kernel. This API > > isn't directly exposed to users. Right now, users can lock mount > > attributes by going through the process of creating a new user > > namespaces, and when the mounts are copied to the "lower privilege" > > domain, they're locked. The mount can be reopened, and passed around > > as a "locked mount". > > Not sure if that's what you're getting at but you can actually fully > create these locked mounts already: > > P1 P2 > # init userns + init mountns # init userns + init mountns > sudo mount --bind /foo /bar > sudo mount --bind -o ro,nosuid,nodev,noexec /bar > > # unprivileged userns + unprivileged mountns > unshare --mount --user --map-root > > mount --bind -oremount > > fd = open_tree(/bar, OPEN_TREE_CLONE) > > send(fd_send, P2); > > recv(&fd_recv, P1) > > move_mount(fd_recv, /locked-mnt); > > and now you have a fully locked mount on the host for P2. Did you mean that? > Yep. Doing this within a program without clone / fork is awkward. Forking and unsharing in random C++ programs doesn't always go super well, so in my mind it'd be nice to have an API to do this directly. In addition, having the superblock continue to be owned by the userns that its mounted in is nice because then they can toggle the other mount attributes (nodev, nosuid, noexec are the ones we care about). > > > > Locked mounts are useful, for example, in container execution without > > user namespaces, where you may want to expose some host data as read > > only without allowing the container to remount the mount as mutable. > > > > The API currently requires that the given privilege is taken away > > while or before locking the flag in the less privileged position. > > This could be relaxed in the future, where the user is allowed to > > remount the mount as read only, but once they do, they cannot make > > it read only again. > > s/read only/read write/ > > > > > Right now, this allows for all flags that are lockable via the > > userns unshare trick to be locked, other than the atime related > > ones. This is because the semantics of what the "less privileged" > > position is around the atime flags is unclear. > > I think that atime stuff doesn't really make sense to expose to > userspace. That seems a bit pointless imho. > > > > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > --- > > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++--- > > include/uapi/linux/mount.h | 2 ++ > > 2 files changed, 39 insertions(+), 3 deletions(-) > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > index 54847db5b819..5396e544ac84 100644 > > --- a/fs/namespace.c > > +++ b/fs/namespace.c > > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ > > struct mount_kattr { > > unsigned int attr_set; > > unsigned int attr_clr; > > + unsigned int attr_lock; > > So when I originally noted down this crazy idea > https://github.com/uapi-group/kernel-features > I didn't envision a new struct member but rather a flag that could be > raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the > other flags in attr_set to become locked. > > So if we could avoid growing the struct pointlessly I'd prefer that. Is > there a reason that wouldn't work? No reason. The semantics were just a little more awkward, IMHO. Specifically: * This attr could never be cleared, only set, which didn't seem to follow the attr_set / attr_clr semantics * If we ever introduced a mount_getattr call, you'd want to expose each of the locked bits independently, I'd think, and exposing that through one flag wouldn't give you the same fidelity. > > I have no strong feelings about this tbh. It seems useful overall to > have this ability. But it deviates a bit from regular mount semantics in > that you can lock mount properties for the lifetime of the mount > explicitly.