On 28 Sep 2021 at 05:16, Dave Chinner wrote: > On Thu, Sep 16, 2021 at 03:36:42PM +0530, Chandan Babu R wrote: >> This commit renames extent counter fields in "struct xfs_dinode" and "struct >> xfs_log_dinode" based on the width of the fields. As of this commit, the >> 32-bit field will be used to count data fork extents and the 16-bit field will >> be used to count attr fork extents. >> >> This change is done to enable a future commit to introduce a new 64-bit extent >> counter field. >> >> Signed-off-by: Chandan Babu R <chandan.babu@xxxxxxxxxx> >> --- >> fs/xfs/libxfs/xfs_format.h | 8 ++++---- >> fs/xfs/libxfs/xfs_inode_buf.c | 4 ++-- >> fs/xfs/libxfs/xfs_log_format.h | 4 ++-- >> fs/xfs/scrub/inode_repair.c | 4 ++-- >> fs/xfs/scrub/trace.h | 14 +++++++------- >> fs/xfs/xfs_inode_item.c | 4 ++-- >> fs/xfs/xfs_inode_item_recover.c | 8 ++++---- >> 7 files changed, 23 insertions(+), 23 deletions(-) >> >> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h >> index dba868f2c3e3..87c927d912f6 100644 >> --- a/fs/xfs/libxfs/xfs_format.h >> +++ b/fs/xfs/libxfs/xfs_format.h >> @@ -802,8 +802,8 @@ typedef struct xfs_dinode { >> __be64 di_size; /* number of bytes in file */ >> __be64 di_nblocks; /* # of direct & btree blocks used */ >> __be32 di_extsize; /* basic/minimum extent size for file */ >> - __be32 di_nextents; /* number of extents in data fork */ >> - __be16 di_anextents; /* number of extents in attribute fork*/ >> + __be32 di_nextents32; /* number of extents in data fork */ >> + __be16 di_nextents16; /* number of extents in attribute fork*/ > > > Hmmm. Having the same field in the inode hold the extent count > for different inode forks based on a bit in the superblock means the > on-disk inode format is not self describing. i.e. we can't decode > the on-disk contents of an inode correctly without knowing whether a > specific feature bit is set in the superblock or not. > > Right now we don't have use external configs to decode the inode. > Feature level conditional fields are determined by inode version, > not superblock bits. Optional feature fields easy to deal with - > zero if the feature is not in use, otherwise we assume it is in use > and can validity check it appropriately. IOWs, we don't need > to look at sb feature bits to decode and validate inode fields. > > This change means that we can't determine if the extent counts are > correct just by looking at the on-disk inode. If we just have > di_nextents32 set to a non-zero value, does that mean we should have > data fork extents or attribute fork extents present? > > Just looking at whether the attr fork is initialised is not > sufficient - it can be initialised with zero attr fork extents > present. We can't look at the literal area contents, either, > because we don't zero that when we shrink it. We can't look at > di_nblocks, because that counts both attr and data for blocks. We > can't look at di_size, because we can have data extents beyond EOF > and hence a size of zero doesn't mean the data fork is empty. > > So if both forks are in extent format, they could be either both > empty, both contain extents or only one fork contains extents but we > can't tell which state is the correct one. Hence > if di_nextents64 if zero, we don't know if di_nextents32 is a count > of attribute extents or data extents without first looking at the > superblock feature bit to determine if di_nextents64 is in use or > not. The inode format is not self describing anymore. > > When XFS introduced 32 bit link counts, the inode version was bumped > from v1 to v2 because it redefined fields in the inode structure > similar to this proposal[1]. The verison number was then used to > determine if the inode was in old or new format - it was a self > describing format change. Hence If we are going to redefine > di_nextents to be able to hold either data fork extent count (old > format) or attr fork extent count (new format) we really need to > bump the inode version so that we can discriminate between the two > inode formats just by looking at the inode itself. > > If we don't want to bump the version, then we need to do something > like: > > - __be32 di_nextents; /* number of extents in data fork */ > - __be16 di_anextents; /* number of extents in attribute fork*/ > + __be32 di_nextents_old;/* old number of extents in data fork */ > + __be16 di_anextents_old;/* old number of extents in attribute fork*/ > ..... > - __u8 di_pad2[12]; > + __be64 di_nextents; /* number of extents in data fork */ > + __be32 di_anextents; /* number of extents in attribute fork*/ > + __u8 di_pad2[4]; > > So that there is no ambiguity in the on-disk format between the two > formats - if one set is non-zero, the other set must be zero in this > sort of setup. > > However, I think that redefining the fields and bumping the inode > version is the better long term strategy, as it allows future reuse > of the di_anextents_old field, and it uses less of the small amount > of unused padding we have remaining in the on-disk inode core. > > At which point, the feature bit in the superblock becomes "has v4 > inodes", not "has big extent counts". We then use v4 inode format in > memory for everything (i.e. 64 bit extent counts) and convert > to/from the ondisk format at IO time like we do with v1/v2 inodes. > > Thoughts? The patch "xfs: Extend per-inode extent counter widths" (which appears later in the series) adds the new per-inode flag XFS_DIFLAG2_NREXT64. This flag is set on inodes which use 64-bit data fork extent counter and 32-bit attr fork extent counter fields. Verifiers can check for the presence/absence of this flag to determine which extent counter fields to use for verification of an xfs_dinode structure. Hence, XFS_DIFLAG2_NREXT64 flag should be sufficient for maintaining the self describing nature of XFS inodes right? > > -Dave. > > [1] The change to v2 inodes back in 1995 removed the filesystem UUID > from the inode and was replaced with a 32 bit link counter, a project ID > value and padding: > > @@ -36,10 +38,12 @@ typedef struct xfs_dinode_core > __uint16_t di_mode; /* mode and type of file */ > __int8_t di_version; /* inode version */ > __int8_t di_format; /* format of di_c data */ > - __uint16_t di_nlink; /* number of links to file */ > + __uint16_t di_onlink; /* old number of links to file */ > __uint32_t di_uid; /* owner's user id */ > __uint32_t di_gid; /* owner's group id */ > - uuid_t di_uuid; /* file unique id */ > + __uint32_t di_nlink; /* number of links to file */ > + __uint16_t di_projid; /* owner's project id */ > + __uint8_t di_pad[10]; /* unused, zeroed space */ > xfs_timestamp_t di_atime; /* time last accessed */ > xfs_timestamp_t di_mtime; /* time last modified */ > xfs_timestamp_t di_ctime; /* time created/inode modified */ > @@ -81,7 +85,13 @@ typedef struct xfs_dinode > > it was the redefinition of the di_uuid variable space that required > the bumping of the inode version... -- chandan