Re: [Bug 216486] New: [xfstests generic/447] xfs_scrub always complains fs corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 14, 2022 at 08:12:56AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216486
> 
>             Bug ID: 216486
>            Summary: [xfstests generic/447] xfs_scrub always complains  fs
>                     corruption
>            Product: File System
>            Version: 2.5
>     Kernel Version: 6.0.0-rc4+
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
>           Reporter: zlang@xxxxxxxxxx
>         Regression: No
> 
> Recently xfstests generic/447 always fails[1][2][3] on latest xfs kernel with
> xfsprogs. It's reproducible on 1k blocksize and rmapbt enabled XFS (-b
> size=1024 -m rmapbt=1). Not sure if it's a kernel bug or a xfsprogs issue, or
> an expected failure.

It's an expected failure that is one of the many things fixed by the
online fsck patchset.  The solution I came up with is described here:
https://djwong.org/docs/xfs-online-fsck-design/#eventual-consistency-vs-online-fsck

The TLDR is that scrub is probably racing with a thread that's in the
middle of doing a file mapping change that involves both an rmap and a
refcount update.  This is possible because we don't hold the AGF buffer
between work items in a defer ops chain.

--D

> [1]
> SECTION       -- default
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 hp-xxxxxxxx-01
> 6.0.0-0.rc4.20220906git53e99dcff61e.32.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed
> Sep 7 07:51:49 UTC 2022
> MKFS_OPTIONS  -- -f -b size=1024 -m rmapbt=1 /dev/sda3
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/scratch
> 
> generic/447 246s ... _check_xfs_filesystem: filesystem on /dev/sda3 failed
> scrub
> (see /root/git/xfstests/results//default/generic/447.full for details)
> 
> [2]
> # cat results//default/generic/447.full
> meta-data=/dev/sda3              isize=512    agcount=16, agsize=3276544 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
> data     =                       bsize=1024   blocks=52424704, imaxpct=25
>          =                       sunit=256    swidth=256 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=1024   blocks=65536, version=2
>          =                       sectsz=512   sunit=256 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> creating 2097152 blocks...
> wrote 2147483648/2147483648 bytes at offset 0
> 2.000 GiB, 512 ops; 0:00:07.59 (269.766 MiB/sec and 67.4414 ops/sec)
> Punching file2...
> ...done
> _check_xfs_filesystem: filesystem on /dev/sda3 failed scrub
> *** xfs_scrub -v -d -n output ***
> EXPERIMENTAL xfs_scrub program in use! Use at your own risk!
> Phase 1: Find filesystem geometry.
> /mnt/scratch: using 1 threads to scrub.
> Phase 2: Check internal metadata.
> Corruption: AG 0 reference count btree: Repairs are required. (scrub.c line
> 196)
> Info: AG 1 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 2 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 3 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 4 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 5 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 6 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 7 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 8 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 9 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 10 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 11 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 12 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 13 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 14 superblock: Optimization is possible. (scrub.c line 212)
> Info: AG 15 superblock: Optimization is possible. (scrub.c line 212)
> Phase 3: Scan all inodes.
> Info: inode 512 (0/512) inode record: Cross-referencing failed. (scrub.c line
> 117)
> Info: inode 515 (0/515) inode record: Cross-referencing failed. (scrub.c line
> 117)
> Info: inode 517 (0/517) inode record: Cross-referencing failed. (scrub.c line
> 117)
> Info: inode 517 (0/517) data block map: Cross-referencing failed. (scrub.c line
> 117)
> Info: /mnt/scratch: Optimizations of inode record are possible. (scrub.c line
> 253)
> Phase 5: Check directory tree.
> Info: /mnt/scratch: Filesystem has errors, skipping connectivity checks.
> (phase5.c line 392)
> Phase 7: Check summary counters.
> 5.2GiB data used;  6 inodes used.
> 1.1GiB data found; 5 inodes found.
> 5 inodes counted; 6 inodes checked.
> /mnt/scratch: corruptions found: 1
> /mnt/scratch: Re-run xfs_scrub without -n.
> *** end xfs_scrub output
> 
> [3]
> # dmesg
> [329558.995550] run fstests generic/447 at 2022-09-13 14:01:24
> [329560.019866] systemd[1]: Started fstests-generic-447.scope - /usr/bin/bash
> -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj;
> exec ./tests/generic/447.
> [329561.466573] XFS (sda3): Mounting V5 Filesystem
> [329561.542655] XFS (sda3): Ending clean mount
> [329561.596681] XFS (sda3): Unmounting Filesystem
> [329561.598209] systemd[1]: mnt-scratch.mount: Deactivated successfully.
> [329562.183863] XFS (sda3): Mounting V5 Filesystem
> [329562.265873] XFS (sda3): Ending clean mount
> [329727.320231] systemd[1]: mnt-scratch.mount: Deactivated successfully.
> [329729.160375] XFS (sda3): Unmounting Filesystem
> [329730.480159] XFS (sda3): Mounting V5 Filesystem
> [329730.559529] XFS (sda3): Ending clean mount
> [329730.595342] systemd[1]: fstests-generic-447.scope: Deactivated
> successfully.
> [329730.597524] systemd[1]: fstests-generic-447.scope: Consumed 2min 44.321s
> CPU time.
> [329730.641904] XFS (sda5): Unmounting Filesystem
> [329730.644716] systemd[1]: mnt-test.mount: Deactivated successfully.
> [329730.899455] XFS (sda3): EXPERIMENTAL online scrub feature in use. Use at
> your own risk!
> [329743.405813] XFS (sda3): Corruption detected during scrub.
> [329743.922150] XFS (sda3): Corruption detected during scrub.
> [329744.438304] XFS (sda3): Corruption detected during scrub.
> [329744.956067] XFS (sda3): Corruption detected during scrub.
> [329745.472617] XFS (sda3): Corruption detected during scrub.
> [329745.988849] XFS (sda3): Corruption detected during scrub.
> [329746.505812] XFS (sda3): Corruption detected during scrub.
> [329747.022342] XFS (sda3): Corruption detected during scrub.
> [329747.538927] XFS (sda3): Corruption detected during scrub.
> [329748.055586] XFS (sda3): Corruption detected during scrub.
> [329748.572338] XFS (sda3): Corruption detected during scrub.
> [329911.911869] XFS (sda3): Unmounting Filesystem
> [329911.913058] XFS (sda3): Uncorrected metadata errors detected; please run
> xfs_repair.
> [329911.913588] systemd[1]: mnt-scratch.mount: Deactivated successfully.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux