On Thu, Aug 10, 2023 at 02:00:43AM -0700, Sargun Dhillon wrote: > We support locking certain mount attributes in the kernel. This API > isn't directly exposed to users. Right now, users can lock mount > attributes by going through the process of creating a new user > namespaces, and when the mounts are copied to the "lower privilege" > domain, they're locked. The mount can be reopened, and passed around > as a "locked mount". Not sure if that's what you're getting at but you can actually fully create these locked mounts already: P1 P2 # init userns + init mountns # init userns + init mountns sudo mount --bind /foo /bar sudo mount --bind -o ro,nosuid,nodev,noexec /bar # unprivileged userns + unprivileged mountns unshare --mount --user --map-root mount --bind -oremount fd = open_tree(/bar, OPEN_TREE_CLONE) send(fd_send, P2); recv(&fd_recv, P1) move_mount(fd_recv, /locked-mnt); and now you have a fully locked mount on the host for P2. Did you mean that? > > Locked mounts are useful, for example, in container execution without > user namespaces, where you may want to expose some host data as read > only without allowing the container to remount the mount as mutable. > > The API currently requires that the given privilege is taken away > while or before locking the flag in the less privileged position. > This could be relaxed in the future, where the user is allowed to > remount the mount as read only, but once they do, they cannot make > it read only again. s/read only/read write/ > > Right now, this allows for all flags that are lockable via the > userns unshare trick to be locked, other than the atime related > ones. This is because the semantics of what the "less privileged" > position is around the atime flags is unclear. I think that atime stuff doesn't really make sense to expose to userspace. That seems a bit pointless imho. > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > --- > fs/namespace.c | 40 +++++++++++++++++++++++++++++++++++--- > include/uapi/linux/mount.h | 2 ++ > 2 files changed, 39 insertions(+), 3 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index 54847db5b819..5396e544ac84 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -78,6 +78,7 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ > struct mount_kattr { > unsigned int attr_set; > unsigned int attr_clr; > + unsigned int attr_lock; So when I originally noted down this crazy idea https://github.com/uapi-group/kernel-features I didn't envision a new struct member but rather a flag that could be raised in attr_set like MOUNT_ATTR_LOCK that would indicate for the other flags in attr_set to become locked. So if we could avoid growing the struct pointlessly I'd prefer that. Is there a reason that wouldn't work? I have no strong feelings about this tbh. It seems useful overall to have this ability. But it deviates a bit from regular mount semantics in that you can lock mount properties for the lifetime of the mount explicitly.