Re: [PATCH V3 07/12] xfs: Rename inode's extent counter fields based on their width

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28 Sep 2021 at 05:16, Dave Chinner wrote:
> On Thu, Sep 16, 2021 at 03:36:42PM +0530, Chandan Babu R wrote:
>> This commit renames extent counter fields in "struct xfs_dinode" and "struct
>> xfs_log_dinode" based on the width of the fields. As of this commit, the
>> 32-bit field will be used to count data fork extents and the 16-bit field will
>> be used to count attr fork extents.
>> 
>> This change is done to enable a future commit to introduce a new 64-bit extent
>> counter field.
>> 
>> Signed-off-by: Chandan Babu R <chandan.babu@xxxxxxxxxx>
>> ---
>>  fs/xfs/libxfs/xfs_format.h      |  8 ++++----
>>  fs/xfs/libxfs/xfs_inode_buf.c   |  4 ++--
>>  fs/xfs/libxfs/xfs_log_format.h  |  4 ++--
>>  fs/xfs/scrub/inode_repair.c     |  4 ++--
>>  fs/xfs/scrub/trace.h            | 14 +++++++-------
>>  fs/xfs/xfs_inode_item.c         |  4 ++--
>>  fs/xfs/xfs_inode_item_recover.c |  8 ++++----
>>  7 files changed, 23 insertions(+), 23 deletions(-)
>> 
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index dba868f2c3e3..87c927d912f6 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -802,8 +802,8 @@ typedef struct xfs_dinode {
>>  	__be64		di_size;	/* number of bytes in file */
>>  	__be64		di_nblocks;	/* # of direct & btree blocks used */
>>  	__be32		di_extsize;	/* basic/minimum extent size for file */
>> -	__be32		di_nextents;	/* number of extents in data fork */
>> -	__be16		di_anextents;	/* number of extents in attribute fork*/
>> +	__be32		di_nextents32;	/* number of extents in data fork */
>> +	__be16		di_nextents16;	/* number of extents in attribute fork*/
>
>
> Hmmm. Having the same field in the inode hold the extent count
> for different inode forks based on a bit in the superblock means the
> on-disk inode format is not self describing. i.e. we can't decode
> the on-disk contents of an inode correctly without knowing whether a
> specific feature bit is set in the superblock or not.
>
> Right now we don't have use external configs to decode the inode.
> Feature level conditional fields are determined by inode version,
> not superblock bits. Optional feature fields easy to deal with -
> zero if the feature is not in use, otherwise we assume it is in use
> and can validity check it appropriately. IOWs, we don't need
> to look at sb feature bits to decode and validate inode fields.
>
> This change means that we can't determine if the extent counts are
> correct just by looking at the on-disk inode. If we just have
> di_nextents32 set to a non-zero value, does that mean we should have
> data fork extents or attribute fork extents present?
>
> Just looking at whether the attr fork is initialised is not
> sufficient - it can be initialised with zero attr fork extents
> present.  We can't look at the literal area contents, either,
> because we don't zero that when we shrink it. We can't look at
> di_nblocks, because that counts both attr and data for blocks. We
> can't look at di_size, because we can have data extents beyond EOF
> and hence a size of zero doesn't mean the data fork is empty.
>
> So if both forks are in extent format, they could be either both
> empty, both contain extents or only one fork contains extents but we
> can't tell which state is the correct one. Hence
> if di_nextents64 if zero, we don't know if di_nextents32 is a count
> of attribute extents or data extents without first looking at the
> superblock feature bit to determine if di_nextents64 is in use or
> not. The inode format is not self describing anymore.
>
> When XFS introduced 32 bit link counts, the inode version was bumped
> from v1 to v2 because it redefined fields in the inode structure
> similar to this proposal[1]. The verison number was then used to
> determine if the inode was in old or new format - it was a self
> describing format change. Hence If we are going to redefine
> di_nextents to be able to hold either data fork extent count (old
> format) or attr fork extent count (new format) we really need to
> bump the inode version so that we can discriminate between the two
> inode formats just by looking at the inode itself.
>
> If we don't want to bump the version, then we need to do something
> like:
>
> -	__be32		di_nextents;	/* number of extents in data fork */
> -	__be16		di_anextents;	/* number of extents in attribute fork*/
> +	__be32		di_nextents_old;/* old number of extents in data fork */
> +	__be16		di_anextents_old;/* old number of extents in attribute fork*/
> .....
> -	__u8            di_pad2[12];
> +	__be64		di_nextents;	/* number of extents in data fork */
> +	__be32		di_anextents;	/* number of extents in attribute fork*/
> +	__u8            di_pad2[4];
>
> So that there is no ambiguity in the on-disk format between the two
> formats - if one set is non-zero, the other set must be zero in this
> sort of setup.
>
> However, I think that redefining the fields and bumping the inode
> version is the better long term strategy, as it allows future reuse
> of the di_anextents_old field, and it uses less of the small amount
> of unused padding we have remaining in the on-disk inode core.
>
> At which point, the feature bit in the superblock becomes "has v4
> inodes", not "has big extent counts". We then use v4 inode format in
> memory for everything (i.e. 64 bit extent counts) and convert
> to/from the ondisk format at IO time like we do with v1/v2 inodes.
>
> Thoughts?

The patch "xfs: Extend per-inode extent counter widths" (which appears later
in the series) adds the new per-inode flag XFS_DIFLAG2_NREXT64. This flag is
set on inodes which use 64-bit data fork extent counter and 32-bit attr fork
extent counter fields. Verifiers can check for the presence/absence of this
flag to determine which extent counter fields to use for verification of an
xfs_dinode structure.

Hence, XFS_DIFLAG2_NREXT64 flag should be sufficient for maintaining the self
describing nature of XFS inodes right?

>
> -Dave.
>
> [1] The change to v2 inodes back in 1995 removed the filesystem UUID
> from the inode and was replaced with a 32 bit link counter, a project ID
> value and padding:
>
> @@ -36,10 +38,12 @@ typedef struct xfs_dinode_core
>         __uint16_t      di_mode;        /* mode and type of file */
>         __int8_t        di_version;     /* inode version */
>         __int8_t        di_format;      /* format of di_c data */
> -       __uint16_t      di_nlink;       /* number of links to file */
> +       __uint16_t      di_onlink;      /* old number of links to file */
>         __uint32_t      di_uid;         /* owner's user id */
>         __uint32_t      di_gid;         /* owner's group id */
> -       uuid_t          di_uuid;        /* file unique id */
> +       __uint32_t      di_nlink;       /* number of links to file */
> +       __uint16_t      di_projid;      /* owner's project id */
> +       __uint8_t       di_pad[10];     /* unused, zeroed space */
>         xfs_timestamp_t di_atime;       /* time last accessed */
>         xfs_timestamp_t di_mtime;       /* time last modified */
>         xfs_timestamp_t di_ctime;       /* time created/inode modified */
> @@ -81,7 +85,13 @@ typedef struct xfs_dinode
>
> it was the redefinition of the di_uuid variable space that required
> the bumping of the inode version...


-- 
chandan



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux