On Mon, May 9, 2022 at 3:48 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote: > > This is a simplification of the getvalues(2) prototype and moving it to the > > getxattr(2) interface, as suggested by Dave. > > > > The patch itself just adds the possibility to retrieve a single line of > > /proc/$$/mountinfo (which was the basic requirement from which the fsinfo > > patchset grew out of). > > > > But this should be able to serve Amir's per-sb iostats, as well as a host of > > other cases where some statistic needs to be retrieved from some object. Note: > > a filesystem object often represents other kinds of objects (such as processes > > in /proc) so this is not limited to fs attributes. > > > > This also opens up the interface to setting attributes via setxattr(2). > > > > After some pondering I made the namespace so: > > > > : - root > > bar - an attribute > > foo: - a folder (can contain attributes and/or folders) > > > > The contents of a folder is represented by a null separated list of names. > > > > Examples: > > > > $ getfattr -etext -n ":" . > > # file: . > > :="mnt:\000mntns:" > > > > $ getfattr -etext -n ":mnt:" . > > # file: . > > :mnt:="info" > > > > $ getfattr -etext -n ":mnt:info" . > > # file: . > > :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012" > > Hey Miklos, > > One comment about this. We really need to have this interface support > giving us mount options like "relatime" back in numeric form (I assume > this will be possible.). It is royally annoying having to maintain a > mapping table in userspace just to do: > > relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME > ro -> MS_RDONLY/MOUNT_ATTR_RDONLY > > A library shouldn't be required to use this interface. Conservative > low-level software that keeps its shared library dependencies minimal > will need to be able to use that interface without having to go to an > external library that transforms text-based output to binary form (Which > I'm very sure will need to happen if we go with a text-based > interface.). > No need for a library. We can export: :mnt:attr:flags (in hex format) > > > > $ getfattr -etext -n ":mntns:" . > > # file: . > > :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:" > > > > $ getfattr -etext -n ":mntns:28:" . > > # file: . > > :mntns:28:="info" > > > > Comments? > > I'm not a fan of text-based APIs and I'm particularly not a fan of the > xattr APIs. But at this point I'm ready to compromise on a lot as long > as it gets us values out of the kernel in some way. :) > > I had to use xattrs extensively in various low-level userspace projects > and they continue to be a source of races and memory bugs. > > A few initial questions: > > * The xattr APIs often require the caller to do sm like (copying some go > code quickly as I have that lying around): > > for _, x := range split { > xattr := string(x) > // Call Getxattr() twice: First, to determine the size of the > // buffer we need to allocate to store the extended attributes, > // second, to actually store the extended attributes in the > // buffer. Also, check if the size of the extended attribute > // hasn't increased between the two calls. > pre, err = unix.Getxattr(path, xattr, nil) > if err != nil || pre < 0 { > return nil, err > } > > dest = make([]byte, pre) > post := 0 > if pre > 0 { > post, err = unix.Getxattr(path, xattr, dest) > if err != nil || post < 0 { > return nil, err > } > } > > if post > pre { > return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post) > } > > xattrs[xattr] = string(dest) > } > > This pattern of requesting the size first by passing empty arguments, > then allocating the buffer and then passing down that buffer to > retrieve that value is really annoying to use and error prone (I do > of course understand why it exists.). > > For real xattrs it's not that bad because we can assume that these > values don't change often and so the race window between > getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But > fwiw, the post > pre check doesn't exist for no reason; we do indeed > hit that race. It is not really a race, you can do {} while (errno != ERANGE) and there will be no race. > > In addition, it is costly having to call getxattr() twice. Again, for > retrieving xattrs it often doesn't matter because it's not a super > common operation but for mount and other info it might matter. > samba and many other projects that care about efficiency solved this a long time ago with an opportunistic buffer - never start with NULL buffer most values will fit in a 1K buffer. > Will we have to use the same pattern for mnt and other info as well? > If so, I worry that the race is way more likely than it is for real > xattrs. > > * Would it be possible to support binary output with this interface? > I really think users would love to have an interfact where they can > get a struct with binary info back. I'm not advocating to make the > whole interface binary but I wouldn't mind having the option to > support it. > Especially for some information at least. I'd really love to have a > way go get a struct mount_info or whatever back that gives me all the > details about a mount encompassed in a single struct. > I suggested that up thread and Greg has explicitly and loudly NACKed it - so you will have to take it up with him > Callers like systemd will have to parse text and will end up > converting everything from text into binary anyway; especially for > mount information. So giving them an option for this out of the box > would be quite good. > > Interfaces like statx aim to be as fast as possible because we exptect > them to be called quite often. Retrieving mount info is quite costly > and is done quite often as well. Maybe not for all software but for a > lot of low-level software. Especially when starting services in > systemd a lot of mount parsing happens similar when starting > containers in runtimes. > This API is not for *everything*. Obviously it does not replace statx and some info (like the cifs OFFLINE flag) should be added to statx. Whether or not mount info needs to get a special treatment like statx is not proven. Miklos claims this is a notification issue- With David Howells' mount notification API, systemd can be pointed at the new mount that was added/removed/changed and then systemd will rarely need to parse thousands of mounts info. > * If we decide to go forward with this interface - and I think I > mentioned this in the lsfmm session - could we please at least add a > new system call? It really feels wrong to retrieve mount and other > information through the xattr interfaces. They aren't really xattrs. > > Imho, xattrs are a bit like a wonky version of streams already (One of > the reasons I find them quite unpleasant.). Making mount and other > information retrievable directly through the getxattr() interface will > turn them into a full-on streams implementation imho. I'd prefer not > to do that (Which is another reason I'd prefer at least a separate > system call.). If you are thinking about a read() like interface for xattr or any alternative data stream then Linus has NACKed it times before. However, we could add getxattr_multi() as Dave Chinner suggested for enumerating multiple keys+ (optional) values. In contrast to listxattr(), getxattr_multi() could allow a "short read" at least w.r.t the size of the vector. Thanks, Amir.