On Thu, 14 Sept 2023 at 11:28, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Wed, Sep 13, 2023 at 05:22:35PM +0200, Miklos Szeredi wrote: > > Add a way to query attributes of a single mount instead of having to parse > > the complete /proc/$PID/mountinfo, which might be huge. > > > > Lookup the mount by the old (32bit) or new (64bit) mount ID. If a mount > > needs to be queried based on path, then statx(2) can be used to first query > > the mount ID belonging to the path. > > > > Design is based on a suggestion by Linus: > > > > "So I'd suggest something that is very much like "statfsat()", which gets > > a buffer and a length, and returns an extended "struct statfs" *AND* > > just a string description at the end." > > So what we agreed to at LSFMM was that we split filesystem option > retrieval into a separate system call and just have a very focused > statx() for mounts with just binary and non-variable sized information. > We even gave David a hard time about this. :) I would really love if we > could stick to that. > > Linus, I realize this was your suggestion a long time ago but I would > really like us to avoid structs with variable sized fields at the end of > a struct. That's just so painful for userspace and universally disliked. > If you care I can even find the LSFMM video where we have users of that > api requesting that we please don't do this. So it'd be great if you > wouldn't insist on it. I completely missed that. What I'm thinking is making it even simpler for userspace: struct statmnt { ... char *mnt_root; char *mountpoint; char *fs_type; u32 num_opts; char *opts; }; I'd still just keep options nul delimited. Is there a good reason not to return pointers (pointing to within the supplied buffer obviously) to userspace? > > This will also allow us to turn statmnt() into an extensible argument > system call versioned by size just like we do any new system calls with > struct arguments (e.g., mount_setattr(), clone3(), openat2() and so on). > Which is how we should do things like that. The mask mechanism also allow versioning of the struct. > > Other than that I really think this is on track for what we ultimately > want. > > > +struct stmt_str { > > + __u32 off; > > + __u32 len; > > +}; > > + > > +struct statmnt { > > + __u64 mask; /* What results were written [uncond] */ > > + __u32 sb_dev_major; /* Device ID */ > > + __u32 sb_dev_minor; > > + __u64 sb_magic; /* ..._SUPER_MAGIC */ > > + __u32 sb_flags; /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */ > > + __u32 __spare1; > > + __u64 mnt_id; /* Unique ID of mount */ > > + __u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */ > > + __u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */ > > + __u32 mnt_parent_id_old; > > + __u64 mnt_attr; /* MOUNT_ATTR_... */ > > + __u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */ > > + __u64 mnt_peer_group; /* ID of shared peer group */ > > + __u64 mnt_master; /* Mount receives propagation from this ID */ > > + __u64 propagate_from; /* Propagation from in current namespace */ > > + __u64 __spare[20]; > > + struct stmt_str mnt_root; /* Root of mount relative to root of fs */ > > + struct stmt_str mountpoint; /* Mountpoint relative to root of process */ > > + struct stmt_str fs_type; /* Filesystem type[.subtype] */ > > I think if we want to do this here we should add: > > __u64 fs_type > __u64 fs_subtype > > fs_type can just be our filesystem magic number and we introduce magic It's already there: sb_magic. However it's not a 1:1 mapping (ext* only has one magic). > numbers for sub types as well. So we don't need to use strings here. Ugh. Thanks, Miklos