On 3/23/2022 3:58 PM, Dave Chinner wrote:
On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
Add a new userspace API that allows getting multiple short values in a
single syscall.
This would be useful for the following reasons:
- Calling open/read/close for many small files is inefficient. E.g. on my
desktop invoking lsof(1) results in ~60k open + read + close calls under
/proc and 90% of those are 128 bytes or less.
How does doing the open/read/close in a single syscall make this any
more efficient? All it saves is the overhead of a couple of
syscalls, it doesn't reduce any of the setup or teardown overhead
needed to read the data itself....
- Interfaces for getting various attributes and statistics are fragmented.
For files we have basic stat, statx, extended attributes, file attributes
(for which there are two overlapping ioctl interfaces). For mounts and
superblocks we have stat*fs as well as /proc/$PID/{mountinfo,mountstats}.
The latter also has the problem on not allowing queries on a specific
mount.
https://xkcd.com/927/
- Some attributes are cheap to generate, some are expensive. Allowing
userspace to select which ones it needs should allow optimizing queries.
- Adding an ascii namespace should allow easy extension and self
description.
- The values can be text or binary, whichever is fits best.
The interface definition is:
struct name_val {
const char *name; /* in */
struct iovec value_in; /* in */
struct iovec value_out; /* out */
uint32_t error; /* out */
uint32_t reserved;
};
Ahhh, XFS_IOC_ATTRMULTI_BY_HANDLE reborn. This is how xfsdump gets
and sets attributes efficiently when dumping and restoring files -
it's an interface that allows batches of xattr operations to be run
on a file in a single syscall.
I've said in the past when discussing things like statx() that maybe
everything should be addressable via the xattr namespace and
set/queried via xattr names regardless of how the filesystem stores
the data. The VFS/filesystem simply translates the name to the
storage location of the information. It might be held in xattrs, but
it could just be a flag bit in an inode field.
Then we just get named xattrs in batches from an open fd.
int getvalues(int dfd, const char *path, struct name_val *vec, size_t num,
unsigned int flags);
@dfd and @path are used to lookup object $ORIGIN. @vec contains @num
name/value descriptors. @flags contains lookup flags for @path.
The syscall returns the number of values filled or an error.
A single name/value descriptor has the following fields:
@name describes the object whose value is to be returned. E.g.
mnt - list of mount parameters
mnt:mountpoint - the mountpoint of the mount of $ORIGIN
mntns - list of mount ID's reachable from the current root
mntns:21:parentid - parent ID of the mount with ID of 21
xattr:security.selinux - the security.selinux extended attribute
data:foo/bar - the data contained in file $ORIGIN/foo/bar
How are these different from just declaring new xattr namespaces for
these things. e.g. open any file and list the xattrs in the
xattr:mount.mnt namespace to get the list of mount parameters for
that mount.
There is a significant and vocal set of people who dislike xattrs
passionately. I often hear them whinging whenever someone proposes
using them. I think that your suggestion has all the advantages of
the getvalues(2) interface while also addressing its shortcomings.
If we could get it past the anti-xattr crowd we might have something.
You could even provide getvalues() on top of it.
Why do we need a new "xattr in everything but name" interface when
we could just extend the one we've already got and formalise a new,
cleaner version of xattr batch APIs that have been around for 20-odd
years already?
Cheers,
Dave.