Re: [PATCH 1/4] statx: Add a system call to make enhanced file info available

Andreas Dilger <adilger@xxxxxxxxx> · Thu, 17 Nov 2016 20:28:57 -0700

On Nov 17, 2016, at 4:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> On Thu, Nov 17, 2016 at 01:35:03PM +0000, David Howells wrote:
>> Add a system call to make extended file information available, including
>> file creation, data version and some attribute flags where available
>> through the underlying filesystem.
>> 
>> 
>> ========
>> OVERVIEW
>> ========
>> 
>> The idea was initially proposed as a set of xattrs that could be retrieved
>> with getxattr(), but the general preferance proved to be for a new syscall
>> with an extended stat structure.
>> 
>> This has a number of uses:
>> 
> ....
>> (5) Data version number: Could be used by userspace NFS servers [Aneesh
>>     Kumar].
>> 
>>     Can also be used to modify fill_post_wcc() in NFSD which retrieves
>>     i_version directly, but has just called vfs_getattr().  It could get
>>     it from the kstat struct if it used vfs_xgetattr() instead.
> 
> This needs a much clearer name that stx_version as "version" is
> entirely ambiguous. e.g. Inodes have internal version numbers to
> disambiguate life cycles. and there are versioning filesystems
> which have multiple versions of the same file.
> 
> So if stx_version this is intended to export the internal filesystem
> inode change counter (i.e. inode->i_version) then lets call it that:
> stx_modification_count. It's clear and unambiguous as to what it
> represents, especially as this counter is more than just a "data
> modification" counter - inode metadata modifications will also
> cause it to change...

Honestly, I don't think "modification count" is necessarily more clear
than "version".  This value isn't necessarily a counter incremented
to hold the number of times the inode was modified, but is used by NFS
only to determine whether the inode has been modified from one access to
the next.  It may not be incremented sequentially for each inode, but
rather be a global value across all inodes in the filesystem.  It is
much more important to clearly document what this version field means.
Maybe "stx_modification_version", but I'm fine with "version" as well,
since this is what the field is named in the kernel.

>> (13) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
>>     Note that the Linux IOC flags are a mess and filesystems such as Ext4
>>     define flags that aren't in linux/fs.h, so translation in the kernel
>>     may be a necessity (or, possibly, we provide the filesystem type too).
> 
> And we now also have FS_IOC_FSGETXATTR that extends the flags
> and information userspace can get from filesystems. It makes little
> sense to now add xstat() and not add everything this interface
> also exports...

The number of flags available will always be a moving target.  Better
to get the core functionality landed, and then add in new flags in a
later patch, otherwise this patch will never be landed.

>> The defined bits in request_mask and stx_mask are:
>> 
>> 	STATX_TYPE		Want/got stx_mode & S_IFMT
>> 	STATX_MODE		Want/got stx_mode & ~S_IFMT
>> 	STATX_NLINK		Want/got stx_nlink
>> 	STATX_UID		Want/got stx_uid
>> 	STATX_GID		Want/got stx_gid
>> 	STATX_ATIME		Want/got stx_atime{,_ns}
>> 	STATX_MTIME		Want/got stx_mtime{,_ns}
>> 	STATX_CTIME		Want/got stx_ctime{,_ns}
>> 	STATX_INO		Want/got stx_ino
>> 	STATX_SIZE		Want/got stx_size
>> 	STATX_BLOCKS		Want/got stx_blocks
>> 	STATX_BASIC_STATS	[The stuff in the normal stat struct]
>> 	STATX_BTIME		Want/got stx_btime{,_ns}
>> 	STATX_VERSION		Want/got stx_version
>> 	STATX_ALL		[All currently available stuff]
> 
> What happens when an application uses STATX_ALL from a future kernel
> that defines more flags than are initially supported, and that
> application then is run on a kernel that onyl supports the initial
> fields?

Fields that are unknown by the current kernel/filesystem will not be set,
and this is reflected in the flags that are returned to userspace.

The minimum required flags are the intersection of the requested flags
from userspace and the flags known by the kernel/filesystem.  It is
possible for the filesystem to optionally return extra flags/fields if
they are free to provide, but it should always return the requested
flags *if* it knows what those flags mean.

>> stx_btime is the file creation time; stx_version is the data version
>> number (i_version); stx_mask is a bitmask indicating the data provided;
>> and __spares*[] are where as-yet undefined fields can be placed.
>> 
>> Time fields are split into separate seconds and nanoseconds fields to make
>> packing easier and the granularities can be queried with the filesystem
>> info system call.  Note that times will be negative if before 1970; in
>> such a case, the nanosecond fields will also be negative if not zero.
> 
> So what happens in ten years time when we want to support
> femptosecond resolution in the timestamp interface? We've got to
> change everything to 64 bit? Shouldn't we just make everything
> timestamp related 64 bit?

Is this a serious request?  Are we going to need to multiply everything
by 10e9 to convert to/from nanoseconds for the next 10 years on the off
chance that we have timestamps more accurate than this in the future?
We could just as easily add extra 32-bit fields in the future to hold
the fraction of nanoseconds if we ever actually need this.  Given that
we've totally bailed on keeping accurate atimes below a day, I'm not
really worried about needing this.

>> The bits defined in the stx_attributes field convey information about a
>> file, how it is accessed, where it is and what it does.  The following
>> attributes map to FS_*_FL flags and are the same numerical value:
> 
> Please isolate the new interface flags completely from the FS_*_FL
> values. We should not repeat the mistake of tying values derived
> from filesystem specific on-disk values to a user interface.

Using the existing FS_*_FL flags as initial values is not worse than
starting with any other arbitrary values for the flags.

>> 	STATX_ATTR_COMPRESSED		File is compressed by the fs
>> 	STATX_ATTR_IMMUTABLE		File is marked immutable
>> 	STATX_ATTR_APPEND		File is append-only
>> 	STATX_ATTR_NODUMP		File is not to be dumped
>> 	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs
>> 
>> The supported flags are listed by:
>> 
>> 	STATX_ATTR_FS_IOC_FLAGS
> 
> Again, we have many more common and extended flags than this.
> NOATIME and SYNC are two that immediately come to mind as generic
> flags that should be in this...

Sure, and they can be added incrementally in a later patch.  I'm not
sure why NOATIME and SYNC are missing, and I'm not against adding them,
but it is equally likely that they were removed in a previous round of
bikeshedding to avoid some real or perceived issue, so that this patch
can finally land rather than being in limbo for another 5 years.

>> [Are any other IOC flags of sufficient general interest to be exposed
>> through this interface?]
>> 
>> New flags include:
>> 
>> 	STATX_ATTR_NONUNIX_OWNERSHIP	File doesn't have Unixy ownership
>> 	STATX_ATTR_HAS_ACL		File has an ACL
> 
> So statx will require us to do ACL lookups? i.e. instead of just
> reading the inode to get the information, we'll also have to do
> extended attribute lookups? That's potentially very expensive if
> the extended attribute is not stored in the inode....

No, there is no requirement to return anything that the caller didn't
ask for.  Only fields that are explicitly requested need to be returned,
and others can optionally be returned if it is easy for the filesystem
to do so.

>> 	STATX_ATTR_KERNEL_API		File is kernel API (eg: procfs/sysfs)
>> 	STATX_ATTR_REMOTE		File is remote and needs network
>> 	STATX_ATTR_FABRICATED		File was made up by fs
> 
> Every file is fabricated by a filesystem :P
> 
> Perhaps you're wanting "virtual file" because it is has no physical
> presence?
> 
>> Fields in struct statx come in a number of classes:
>> 
>> (0) stx_dev_*, stx_blksize.
>> 
>>     These are local system information and are always available.
> 
> What does stx_blksize actually mean? It's completely ambiguous in
> stat() because we don't actually report the physical block size
> here - we report the "minimum unit of efficient IO" that we expect
> applications to use. Please define :P
> 
> 
>> =======
>> TESTING
>> =======
>> 
>> The following test program can be used to test the statx system call:
>> 
>> 	samples/statx/test-statx.c
>> 
>> Just compile and run, passing it paths to the files you want to examine.
>> The file is built automatically if CONFIG_SAMPLES is enabled.
> 
> Can we get xfstests written to exercise and validate all this
> functionality, please? I'd suggest that adding xfs_io support for
> the statx syscall would be far more useful for xfstests than a
> standalone  test program, too. We already have equivalent stat()
> functionality in xfs_io and that's used quite a bit in xfstests....
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail