Re: [RFC PATCH 2/3] add statmnt(2) syscall

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 18 Sep 2023 15:51:42 +0200

> Atomicity of getting a snapshot of the current mount tree with all of
> its attributes was never guaranteed, although reading
> /proc/self/mountinfo into a sufficiently large buffer would work that
> way.   However, I don't see why mount trees would require stronger
> guarantees than dentry trees (for which we have basically none).

So atomicity was never put forward as a requirement. In that
session/recording I explicitly state that we won't guarantee atomicity.
And systemd agreed with this. So I think we're all on the same page.

> Even more type clean interface:
> 
> struct statmnt *statmnt(u64 mnt_id, u64 mask, void *buf, size_t
> bufsize, unsigned int flags);
> 
> Kernel would return a fully initialized struct with the numeric as
> well as string fields filled.  That part is trivial for userspace to
> deal with.

I really would prefer a properly typed struct and that's what everyone
was happy with in the session as well. So I would not like to change the
main parameters.

> > Plus, the format for how to return arbitrary filesystem mount options
> > warrants a separate discussion imho as that's not really vfs level
> > information.
> 
> Okay.   Let's take fs options out of this.

Thanks.

> 
> That leaves:
> 
>  - fs type and optionally subtype

So since subtype is FUSE specific it might be better to move this to
filesystem specific options imho.

>  - root of mount within fs
>  - mountpoint path
> 
> The type and subtype are naturally limited to sane sizes, those are
> not an issue.

What's the limit for fstype actually? I don't think there is one.
There's one by chance but not by design afaict?

Maybe crazy idea:
That magic number thing that we do in include/uapi/linux/magic.h
is there a good reason for this or why don't we just add a proper,
simple enum:

enum {
        FS_TYPE_ADFS        1
        FS_TYPE_AFFS        2
        FS_TYPE_AFS         3
        FS_TYPE_AUTOFS      4
	FS_TYPE_EXT2	    5
	FS_TYPE_EXT3	    6
	FS_TYPE_EXT4	    7
	.
	.
	.
	FS_TYPE_MAX
}

that we start returning from statmount(). We can still return both the
old and the new fstype? It always felt a bit odd that fs developers to
just select a magic number.

> 
> For paths the evolution of the relevant system/library calls was:
> 
>   char *getwd(char buf[PATH_MAX]);
>   char *getcwd(char *buf, size_t size);
>   char *get_current_dir_name(void);
> 
> It started out using a fixed size buffer, then a variable sized
> buffer, then an automatically allocated buffer by the library, hiding
> the need to resize on overflow.
> 
> The latest style is suitable for the statmnt() call as well, if we
> worry about pleasantness of the API.

So, can we then do the following struct:

struct statmnt {
        __u64 mask;             /* What results were written [uncond] */
        __u32 sb_dev_major;     /* Device ID */
        __u32 sb_dev_minor;
        __u64 sb_magic;         /* ..._SUPER_MAGIC */
        __u32 sb_flags;         /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
        __u32 __spare1;
        __u64 mnt_id;           /* Unique ID of mount */
        __u64 mnt_parent_id;    /* Unique ID of parent (for root == mnt_id) */
        __u32 mnt_id_old;       /* Reused IDs used in proc/.../mountinfo */
        __u32 mnt_parent_id_old;
        __u64 mnt_attr;         /* MOUNT_ATTR_... */
        __u64 mnt_propagation;  /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
        __u64 mnt_peer_group;   /* ID of shared peer group */
        __u64 mnt_master;       /* Mount receives propagation from this ID */
        __u64 propagate_from;   /* Propagation from in current namespace */
	__aligned_u64 mountpoint;
	__u32 mountpoint_len;
	__aligned_u64 mountroot;
	__u32 mountroot_len;
        __u64 __spare[20];
};

Userspace knows already how to deal with that because of bpf and other
structs (e.g., both systemd and LXC have ptr_to_u64() helpers and so
on). Libmount and glibc can hide this away internally as well.