Re: User-visible context-mount API

Miklos Szeredi <mszeredi@xxxxxxxxxx> · Tue, 16 Jan 2018 10:01:25 +0100

[Adding linux-api@vger]

On Mon, Jan 15, 2018 at 5:07 PM, David Howells <dhowells@xxxxxxxxxx> wrote:
> I've been looking at the context-mount API visible to userspace as I need to
> adjust the security ops to handle it.  I'm thinking I probably need something
> like the following system calls.  Note that:
>
>  topology_flags are MS_PRIVATE, MS_SLAVE, MS_SHARED, MS_UNBINDABLE.
>
>  mount_flags are things like MS_NOSUID, MS_NODEV, MS_NOEXEC that get
>  translated to MNT_* flags in the kernel.
>
>  (1) Open a filesystem and create a blank context from it:
>
>         fd = fsopen(const char *fs_name, unsigned int flags, ...);
>
>      where flags includes FSOPEN_CLOEXEC, FSOPEN_CREATE_ONLY (don't reuse
>      superblock).
>
>  (2) Access and change the context:
>
>         write(fd, "<command>", ...);
>         read(fd, ...);
>         ioctl(fd, ...);
>
>  (3) Create and set up a context for an existing mountpoint:
>
>         fd = fspick(int dfd, const char *path, unsigned int flags);
>
>      where flags includes FSPICK_CLOEXEC.
>
>  (4) Create a mountpoint on a path, using a context to supply the superblock
>      details:
>
>         mount_create(int fd, int dfd, const char *path,
>                      unsigned int topology_flags,
>                      unsigned int mount_flags);
>
>  (5) Move a mount:
>
>         mount_move(int from_dfd, const char *frompath,
>                    int to_dfd, const char *topath);
>
>      This might want to take new topology flags algo.
>
>  (6) Adjust a mountpoint's topology flags:
>
>         mount_set_topology(int dfd, const char *path,
>                            unsigned int topology_flags);
>
>  (7) Reconfigure a mountpoint:
>
>         mount_reconfigure(int dfd, const char *path,
>                           unsigned int mount_flags);

What's the fundamental  difference between topology flags and other
flags?  Why two syscalls?

Also I think we need a "mask" argument telling the kernel which flags
need to be changed.

>
>  (8) Change R/O protection on a mountpoint:
>
>         mount_protect(int dfd, const char *path,
>                       bool read_only);
>
>      This involves changing the R/O protection on the superblock also, but
>      might be mergeable with mount_reconfigure().

Methinks this should be merged with mount_reconfigure(), and if
superblock state needs to be changed, than that should be done with
the "remount" procedure below.

> Note that two things are missing from the list:
>
>  (1) Bind mount.  This is done by:
>
>         fd = fspick("/mnt/a");
>         mount_create(fd, ..., "/mnt/b", ...);
>         mount_create(fd, ..., "/mnt/c", ...);
>         mount_create(fd, ..., "/mnt/d", ...);
>
>  (2) Remount.  Superblock reconfiguration is done by something like:
>
>         fd = fspick("/mnt/a");
>         write(fd, "? fs");
>         read(fd, filesystem_type);
>         write(fd, "o user_xattr"); // Indicate changes to be made
>         write(fd, "x reconfigure"); // Perform the reconfiguration
>
> Thinking further on this, maybe I should make a mountpoint-context also, so
> that it can be loaded up with target namespace information and other goodies.
> This would vastly expand the parameter space for a mountpoint beyond the few
> syscall args available.  Creating a new mount might then look like:
>
>         sbfd = fsopen("ext4");
>         write(sbfd, "d /dev/sda1");
>         write(sbfd, "o user_xattr");
>         write(sbfd, "x commit");
>
>         mfd = mount_new();
>         write(mfd, "ns mnt 123"); // where fd 123 refers to a mount namesapce
>         write(mfd, "o bind=1"); // Set MS_BIND

What does MS_BIND mean here?

>         write(mfd, "o nosuid=1"); // Set MS_NOSUID
>
>         mount_create(mfd, AT_FDCWD, "/mnt/a", sbfd);

Yeah, more flexible, but also more complicated, with mount_create()
now taking 3 file descriptors, ugh...

Thanks,
Miklos