Re: Regular FS shutdown while rsync is running

Lucas Stach <l.stach@xxxxxxxxxxxxxx> · Tue, 22 Jan 2019 11:39:53 +0100

Hi Brian,

Am Montag, den 21.01.2019, 13:11 -0500 schrieb Brian Foster:
[...]
> > So for the moment, here's the output of the above sequence.
> > 
> > xfs_db> convert agno 5 agbno 7831662 fsb
> > 0x5077806e (1350008942)
> > xfs_db> fsb 0x5077806e
> > xfs_db> type finobt
> > xfs_db> print
> > magic = 0x49414233
> > level = 1
> > numrecs = 335
> > leftsib = 7810856
> > rightsib = null
> > bno = 7387612016
> > lsn = 0x6671003d9700
> > uuid = 026711cc-25c7-44b9-89aa-0aac496edfec
> > owner = 5
> > crc = 0xe12b19b2 (correct)
> 
> As expected, we have the inobt magic. Interesting that this is a fairly
> full intermediate (level > 0) node. There is no right sibling, which
> means we're at the far right end of the tree. I wouldn't mind poking
> around a bit more at the tree, but that might be easier with access to
> the metadump. I also think that xfs_repair would have complained were
> something more significant wrong with the tree.
> 
> Hmm, I wonder if the (lightly tested) diff below would help us catch
> anything. It basically just splits up the currently combined inobt and
> finobt I/O verifiers to expect the appropriate magic number (rather than
> accepting either magic for both trees). Could you give that a try?
> Unless we're doing something like using the wrong type of cursor for a
> particular tree, I'd think this would catch wherever we happen to put a
> bad magic on disk. Note that this assumes the underlying filesystem has
> been repaired so as to try and detect the next time an on-disk
> corruption is introduced.
> 
> You'll also need to turn up the XFS error level to make sure this prints
> out a stack trace if/when a verifier failure triggers:
> 
> echo 5 > /proc/sys/fs/xfs/error_level
> 
> I guess we also shouldn't rule out hardware issues or whatnot. I did
> notice you have a strange kernel version: 4.19.4-holodeck10. Is that a
> distro kernel? Has it been modified from upstream in any way? If so, I'd
> strongly suggest to try and confirm whether this is reproducible with an
> upstream kernel.

With the finobt verifier changes applied we are unable to mount the FS,
even after running xfs_repair.

xfs_repair had found "bad magic # 0x49414233 in inobt block 5/2631703",
which would be daddr 0x1b5db40b8 according to xfs_db. The mount trips
over a buffer at a different daddr though:

[   73.237007] XFS (dm-3): Mounting V5 Filesystem
[   73.456481] XFS (dm-3): Ending clean mount
[   74.132671] XFS (dm-3): Metadata corruption detected at xfs_finobt_verify+0x50/0x90 [xfs], xfs_finobt block 0x1b5df7d50 
[   74.133028] XFS (dm-3): Unmount and run xfs_repair
[   74.133184] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[   74.133395] 00000000e44dfb87: 49 41 42 33 00 01 01 50 00 07 53 58 ff ff ff ff  IAB3...P..SX....
[   74.133679] 000000009f21b317: 00 00 00 01 b5 df 7d 50 00 00 00 00 00 00 00 00  ......}P........
[   74.133964] 000000003429321b: 02 67 11 cc 25 c7 44 b9 89 aa 0a ac 49 6e df ec  .g..%.D.....In..
[   74.134272] 00000000fe79b835: 00 00 00 05 24 52 54 c6 32 dc 7d 00 32 e9 b9 a0  ....$RT.2.}.2...
[   74.134554] 00000000d1e887dc: 32 f4 97 80 33 01 36 80 33 09 ca 80 33 1e b7 80  2...3.6.3...3...
[   74.134852] 00000000612879d2: 33 2f 50 00 33 33 e8 80 33 40 a9 c0 33 4c 08 80  3/P.33..3@..3L..
[   74.135140] 00000000e63fd33a: 33 64 d7 80 33 79 34 40 33 8f 08 80 33 a7 be c0  3d..3y4@3...3...
[   74.135427] 00000000d1c405d7: 33 b6 10 80 33 bf 1e c0 33 d0 99 00 33 df cd 00  3...3...3...3...
[   74.135871] XFS (dm-3): Metadata corruption detected at xfs_finobt_verify+0x50/0x90 [xfs], xfs_finobt block 0x1b5df7d50 
[   74.136231] XFS (dm-3): Unmount and run xfs_repair
[   74.136390] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[   74.136604] 00000000e44dfb87: 49 41 42 33 00 01 01 50 00 07 53 58 ff ff ff ff  IAB3...P..SX....
[   74.136887] 000000009f21b317: 00 00 00 01 b5 df 7d 50 00 00 00 00 00 00 00 00  ......}P........
[   74.137174] 000000003429321b: 02 67 11 cc 25 c7 44 b9 89 aa 0a ac 49 6e df ec  .g..%.D.....In..
[   74.137463] 00000000fe79b835: 00 00 00 05 24 52 54 c6 32 dc 7d 00 32 e9 b9 a0  ....$RT.2.}.2...
[   74.137750] 00000000d1e887dc: 32 f4 97 80 33 01 36 80 33 09 ca 80 33 1e b7 80  2...3.6.3...3...
[   74.138035] 00000000612879d2: 33 2f 50 00 33 33 e8 80 33 40 a9 c0 33 4c 08 80  3/P.33..3@..3L..
[   74.138358] 00000000e63fd33a: 33 64 d7 80 33 79 34 40 33 8f 08 80 33 a7 be c0  3d..3y4@3...3...
[   74.138639] 00000000d1c405d7: 33 b6 10 80 33 bf 1e c0 33 d0 99 00 33 df cd 00  3...3...3...3...
[   74.138964] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x1b5df7d50 len 8 error 117
[   78.489686] XFS (dm-3): Error -117 reserving per-AG metadata reserve pool.
[   78.489691] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000b2beb4b0
[   78.489697] XFS (dm-3): Corruption of in-memory data detected.  Shutting down filesystem
[   78.489955] XFS (dm-3): Please umount the filesystem and rectify the problem(s)

Is this a real issue, or false positive due to things working
differently during early mount?

Regards,
Lucas