On Fri, Nov 10, 2023 at 01:03:49PM +1100, Dave Chinner wrote: > On Fri, Nov 10, 2023 at 09:36:51AM +0800, Zorro Lang wrote: > > The g/047 still fails with this 2nd patch. So I did below steps [1], > > and get the trace output as [2], those dump_inodes() messages you > > added have been printed, please check. > > And that points me at the bug. > > dump_inodes: disk ino 0x83: init nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: init nblocks 0x8 nextents 0x0/0x1 anextents 0x0/0x0 v3_pad 0x1 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > The initial log inode is correct. > > dump_inodes: disk ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 > dump_inodes: log ino 0x83: pre nblocks 0x8 nextents 0x0/0x0 anextents 0x0/0x0 v3_pad 0x0 nrext64_pad 0x0 di_flags2 0x18 big > ^^^^^^^ > > .... but on the second sample, it's been modified and the extent > count has been zeroed? Huh, that is unexpected - what did that? > > Oh. > > Can you test the patch below and see if it fixes the issue? Keep > the first verifier patch I sent, then apply the patch below. You can > drop the debug traceprintk patch - the patch below should fix it. Great, Dave! It works, below testing [1] passed on my s390x now. Maybe I should write Tested-by, but as a reporter, so Reported-by might be proper, so... Reported-by: Zorro Lang <zlang@xxxxxxxxxx> Anyway, I'm doing a full round fstests "auto" group testing on s390x with this patch, to check if there're more issue on big endian machine. If you have any specific concern (testing condition) hope to test, please tell me, I'll enhance the testing of that part. Thanks, Zorro [1] # ./check generic/047 generic/039 generic/065 FSTYP -- xfs (debug) PLATFORM -- Linux/s390x ibm-z-507 6.6.0+ #1 SMP Wed Nov 8 12:51:20 EST 2023 MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=0,reflink=1,bigtime=1,inobtcount=1 /dev/loop1 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/loop1 /mnt/fstests/SCRATCH_DIR generic/039 0s generic/047 10s ... 10s generic/065 1s Ran: generic/039 generic/047 generic/065 Passed all 3 tests > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > xfs: recovery should not clear di_flushiter unconditionally > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > zero padding in the inode, except when NREXT64=1 configurations are > in use and the zero padding is no longer padding but holds the 64 > bit extent counter. > > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > machines. Fix it by only clearing di_flushiter on v2 inodes during > recovery. > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > cc: stable@xxxxxxxxxx # 5.19+ > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..dbdab4ce7c44 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > * superblock flag to determine whether we need to look at di_flushiter > * to skip replay when the on disk inode is newer than the log one > */ > - if (!xfs_has_v3inodes(mp) && > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > - /* > - * Deal with the wrap case, DI_MAX_FLUSH is less > - * than smaller numbers > - */ > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > - /* do nothing */ > - } else { > - trace_xfs_log_recover_inode_skip(log, in_f); > - error = 0; > - goto out_release; > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } > } > + > + /* Take the opportunity to reset the flush iteration count */ > + ldip->di_flushiter = 0; > } > > - /* Take the opportunity to reset the flush iteration count */ > - ldip->di_flushiter = 0; > > if (unlikely(S_ISREG(ldip->di_mode))) { > if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && >