Re: [RFC PATCH 1/4] fs: Add FS_XFLAG_COMPRESSED & FS_XFLAG_ENCRYPTED for FS_IOC_FS[GS]ETXATTR API

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 25 Feb 2025 14:42:14 +1100

On Fri, Feb 21, 2025 at 11:34:43AM -0500, Theodore Ts'o wrote:
> I think a few people were talking past each other, because there are two
> fileds in struct fileattr --- flags, and fsx_xflags.  The flags field
> is what was originally used by FS_IOC_EXT2_[GS]ETFLAGS, which later

I don't think anyone has been confusing the two - the entire
discussion has been about fsx_xflags and the struct fsxattr...

> started getting used by many other file systems, starting with
> resierfs and btrfs, and so it became FS_IOC_[GS]ETFLAGS.  The bits in
> that flags word were both the ioctl ABI and the on-disk encoding, and
> because we were now allowing multiple file systems to allocate bits,
> and we needed to avoid stepping on each other (for example since btrfs
> started using FS_NOCOW_FL, that bit position wouldn't be used by ext4,
> at least not for a publically exported flag).
> 
> So we started running out of space in the FS_FLAG_*_FL namespace, and
> that's why we created FS_IOC_[GS]ETXATTR and the struct fsxattr.  The

No, that is most certainly not how this API came about. 

The FS_IOC_[GS]ETXATTR ioctls were first implement on IRIX close on
30 years ago. They were ported to Linux with the XFS linux port over
2 decades ago. Indeed, we've been using them for xfsdump/xfs_restore
since before XFS was ported to linux.

They got lifted to the VFS back in 2016 so that ext4 could use the
interface for getting/setting project IDs on files. This was done so
that existing userspace functionality for setting up
project/directory quotas on XFS could also be used on ext4.

> FS_XFLAG_*_FL space has plenty of space; there are 14 unassigned bit
> positions, by my count.
> 
> As far as the arguments about "proper interface design", as far as
> Linux is concerned, backwards compatibility trumps "we should have
> done if it differently".  The one and only guarantee that we have that
> FS_IOC_GETXATTR followed by FS_IOC_SETXATTR will work.  Nothing else.

That's a somewhat naive understanding of the overall API. The struct
fsxattr information is also directly exported to userspace via the
XFS blukstat ioctls. i.e. extent size hints, fsx_xflags, project
IDs, etc are all exported to userspace via multiple ioctl
interfaces.

This is all used by xfsdump/xfs_restore to be able to back up and
restore the inode state that is exposed/controlled by the
GET/SETXATTR interfaces.

> The use case of "what if a backup program wants to backup the flags
> and restore on a different file system" is one that hasn't been
> considered, and I don't think any backup programs do it today.

Wrong. As I've already said: we have been doing exactly this for 20+
years with xfsdump/restore.

xfsdump uses the bulkstat version of the GET interface, whilst
restore uses the FS_IOC_SETXATTR interface.

> For
> that matter, some of the flags, such as the NODUMP flag, are designed
> to be instructions to a dump/restore system, and not really one that
> *should* be backed up.

Yes. xfsdump sees this in the bulkstat flags field for the inode and
then omits the inode from the dump.

Further, xfs_fsr (the online file defragmenter for XFS) uses
bulkstat and looks at the FS_XFLAGS returned from bulkstat for each
inode it scans.

> Again, the only semantic that was guaranteed
> is GETXATTR or GETXATTR followed by SETXATTR.

For making a single delta state change, yes.

For the dump/restore case, calling SETXATTR on a newly created file
with a preconstructed struct fsxattr state retreived at dump time is
also supported.

This is not a general use case - it will destroy any existing state
that file was created with (e.g. override admin inheritence
settings) by overwriting it with the state from the backup.

It should also be noted that xfs_restore does this in two SETXATTR
calls, not one. i.e. it splits the set operation into a
pre-data restore SETXATTR, and one post-data restore SETXATTR.

Why?

Because stuff like extent size hints and realtime state needs to be
restored before any data is written whilst others can only be set
after the data has been written because they would otherwise prevent
data restoration:

/* extended inode flags that can only be set after all data
 * has been restored to a file.
 */
#define POST_DATA_XFLAGS        (XFS_XFLAG_IMMUTABLE |          \
                                  XFS_XFLAG_APPEND |            \
                                  XFS_XFLAG_SYNC)

Yup, you can't restore data to the file if it has already been
marked as immutable....

IOWs, any changes to the flag space also needs to be compatible with
the XFS bulkstat shadowing of the fsxattr fields+flags and the
existing usage of these APIs by xfsdump, xfs_restore and xfs_fsr.

> We can define some new interface for return what xflags are supported
> by a particular file system.

Why do we even care?

On the get side, it just doesn't matter - if the flag isn't set, it
either is not active or not supported. Either way, it doesn't
matter if there's a "this is supported mask".

On the set side, adding a mask isn't going to change historic
behaviour: existing applications will ignore the new mask because
they haven't been coded to understand it. And vice versa, an old
kernel will ignore the feature mask if the app uses it because it
ignores unknown flags/fields.

IOWs, adding a feature mask doesn't solve any of the problems
related to forwards/backwards compatibility of new features, and so
we are back to needing to use the API as a GET/SET pair where the
GET sets all the known state correctly such that a SET operation
will either do nothing, change the state or return an error because
an invalid combination of known parameters was passed.

> I suppose the field could double as the bitmask field when
> FS_IOC_SETXATTR is called, but that just seems to be an overly complex
> set of semantics.  If someone really wants to do that, I wouldn't
> really complain, but then what we would actually call the field
> "flags_supported_on_get_bitmask_on_set" would seem a bit wordy.  :-)

That effectively prevents the existing dump/restore usage of the
API.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx