Re: is it safe to xfs_repair this volume? do i have a different first step?

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 8 Feb 2019 08:00:08 -0500

On Thu, Feb 07, 2019 at 09:25:13PM -0500, David T-G wrote:
> Brian, et al --
> 
> ...and then Brian Foster said...
> % 
> % On Thu, Feb 07, 2019 at 08:25:34AM -0500, David T-G wrote:
> % > 
> % > I have a four-disk RAID5 volume with an ~11T filesystem that suddenly
> % > won't mount 
> ...
> % > when poking, I at first thought that this was a RAID issue, but all of
> % > the md reports look good and apparently the GPT table issue is common, so
> % > I'll leave all of that out unless someone asks for it.
> % 
> % I'd be curious if the MD metadata format contends with GPT metadata. Is
> % the above something you've ever tried before running into this problem
> % and thus can confirm whether it preexisted the mount problem or not?
> 
> There's a lot I don't know, so it's quite possible that it doesn't line
> up.  Here's what mdadm tells me:
> 
>   diskfarm:root:6:~> mdadm --detail /dev/md0
>   /dev/md0:
>           Version : 1.2
>     Creation Time : Mon Feb  6 00:56:35 2017
>        Raid Level : raid5
>        Array Size : 11720265216 (11177.32 GiB 12001.55 GB)
>     Used Dev Size : 3906755072 (3725.77 GiB 4000.52 GB)
>      Raid Devices : 4
>     Total Devices : 4
>       Persistence : Superblock is persistent
> 
>       Update Time : Fri Jan 25 03:32:18 2019
>             State : clean
>    Active Devices : 4
>   Working Devices : 4
>    Failed Devices : 0
>     Spare Devices : 0
> 
>            Layout : left-symmetric
>        Chunk Size : 512K
> 
>              Name : diskfarm:0  (local to host diskfarm)
>              UUID : ca7008ef:90693dae:6c231ad7:08b3f92d
>            Events : 48211
> 
>       Number   Major   Minor   RaidDevice State
>          0       8       17        0      active sync   /dev/sdb1
>          1       8       65        1      active sync   /dev/sde1
>          3       8       81        2      active sync   /dev/sdf1
>          4       8        1        3      active sync   /dev/sda1
>   diskfarm:root:6:~>
>   diskfarm:root:6:~> for D in a1 b1 e1 f1 ; do mdadm --examine /dev/sd$D | egrep "$D|Role|State|Checksum|Events" ; done
>   /dev/sda1:
>             State : clean
>       Device UUID : f05a143b:50c9b024:36714b9a:44b6a159
>          Checksum : 4561f58b - correct
>            Events : 48211
>      Device Role : Active device 3
>      Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
>   /dev/sdb1:
>             State : clean
>          Checksum : 4654df78 - correct
>            Events : 48211
>      Device Role : Active device 0
>      Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
>   /dev/sde1:
>             State : clean
>          Checksum : c4ec7cb6 - correct
>            Events : 48211
>      Device Role : Active device 1
>      Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
>   /dev/sdf1:
>             State : clean
>          Checksum : 349cf800 - correct
>            Events : 48211
>      Device Role : Active device 2
>      Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
> 
> Does that set off any alarms for you?
>

It looks normal to me, but I'm not an MD person. I also don't think an
MD format / GPT format conflict is something that mdadm will show. It
not appear until/unless you change the geometry on one side or the
other. Again, I'd strongly suggest to validate your configuration with
linux-raid before making any such changes.

> 
> % 
> % If not, I'd suggest some more investigation into this before you make
> % any future partition or raid changes to this storage. I thought there
> % were different MD formats to accommodate precisely this sort of
> % incompatibility, but I don't know for sure. linux-raid is probably more
> % of a help here.
> 
> Thanks :-)  I have no plans to partition, but I will eventually want to
> grow it, so I'll definitely have to check on that.
> 
> 
> % 
> % > dmesg reports some XFS problems
> % > 
> % >   diskfarm:root:5:~> dmesg | egrep 'md[:/0]'
> ...
> % >   [  202.230961] XFS (md0p1): Mounting V4 Filesystem
> % >   [  203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8.
> % >   [  203.367581] XFS (md0p1): failed to locate log tail
> % >   [  203.367587] XFS (md0p1): log mount/recovery failed: error -74
> % >   [  203.367712] XFS (md0p1): log mount failed
> ...
> % 
> % Hmm. So part of the on-disk log is invalid. We attempt to deal with this
> ...
> % I'd guess that the torn write is due to interleaving log writes across
> % raid devices or something, but we can't really tell from just this.
> 
> The filesystem *shouldn't* see that there are distinct devices under
> there, since that's handled by the md driver, but there's STILL a lot
> that I don't know :-)
> 

It doesn't see multiple devices, but a contiguous range of filesystem
blocks (such as the fs log) that happen to map to multiple physical
devices by underlying storage layer.

> 
> % 
> % >   diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = '
> ...
> % >           - scan filesystem freespace and inode maps...
> % >   sb_fdblocks 471930978, counted 471939170
> % 
> % The above said, the corruption here looks extremely minor. You basically
> ...
> % scans and not much else going on.
> 
> That sounds hopeful! :-)
> 
> 
> % 
> % >           - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done
> ...
> % >   Phase 7 - verify link counts...
> % >           - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done
> % >   No modify flag set, skipping filesystem flush and exiting.
> % > 
> % > is the trimmed output that can fit on one screen.  Since I don't have a
> ...
> % 
> % What do you mean by trimmed output? Was there more output from
> % xfs_repair that is not shown here?
> 
> Yes.  Note the
> 
>   | egrep -v 'agno = '
> 
> on the command line above.  The full output
> 
>   diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md >/tmp/xfs_repair.out 2>&1
>   diskfarm:root:4:~> wc -l /tmp/xfs_repair.out
>   124 /tmp/xfs_repair.out
> 
> was quite long.  Shall I attach that file or post a link?
> 

Please post the full repair output.

> 
> % 
> % In general, if you're concerned about what xfs_repair might do to a
> % particular filesystem you can always do a normal xfs_repair run against
> % a metadump of the filesystem before the original copy. Collect a
> % metadump of the fs:
> % 
> % xfs_metadump -go <dev> <outputmdimg>
> 
> Hey, cool!  I like that :-)  It generated a sizeable output file
> 
>   diskfarm:root:8:~> xfs_metadump /dev/disk/by-label/4Traid5md /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out >/mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr 2>&1
>   diskfarm:root:8:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out
>   -rw-r--r-- 1 3.5G Feb  7 17:57 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out
>   diskfarm:root:8:~> wc -l /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr
>   239 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr
> 
> as well as quite a few errors.  Here
> 
>   diskfarm:root:8:~> head /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot init perag data (5). Continuing anyway.
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read dir2 block 39/132863 (2617378559)
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read dir2 block 41/11461784 (2762925208)
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read dir2 block 41/4237562 (2755700986)
>   xfs_metadump: error - read only 0 of 4096 bytes
> 
>   diskfarm:root:8:~> tail /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out.stdout-stderr
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read superblock for ag 47
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read agf block for ag 47
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read agi block for ag 47
>   xfs_metadump: error - read only 0 of 4096 bytes
>   xfs_metadump: cannot read agfl block for ag 47
>   xfs_metadump: Filesystem log is dirty; image will contain unobfuscated metadata in log.
>   cache_purge: shake on cache 0x4ee1c0 left 117 nodes!?
> 
> is a glance at the contents.  Should I post/paste the full copy?
> 

It couldn't hurt. Perhaps this suggests there are other issues beyond
what was shown in the original repair output.

> 
> % 
> % Note that the metadump collects everything except file data so it will
> % require a decent amount of space depending on how much metadata
> % populates your fs vs. data.
> % 
> % Then restore the metadump to a sparse file (on some other
> % filesystem/storage):
> % 
> % xfs_mdrestore -g <mdfile> <sparsefiletarget>
> 
> I tried this 
> 
>   diskfarm:root:11:~> dd if=/dev/zero bs=1 count=0 seek=4G of=/mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso
>   0+0 records in
>   0+0 records out
>   0 bytes copied, 6.7252e-05 s, 0.0 kB/s
>   diskfarm:root:11:~> ls -goh /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso
>   -rw-r--r-- 1 4.0G Feb  7 21:15 /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso
>   diskfarm:root:11:~> xfs_mdrestore /mnt/750Graid5md/tmp/4Traid5md.xfs_d.out /mnt/750Graid5md/tmp/4Traid5md.xfs_d.iso
>   xfs_mdrestore: cannot set filesystem image size: File too large
> 
> and got an error :-(  Should a 4G file be large enough for a 3.5G
> metadata dump?
> 

The output file size is too large and not supported by the underlying
filesystem.  Note that the output file size will match the size of the
original fs despite the fact that the image may only consume 3.5G worth
of space. What is the underlying fs? You might need to find somewhere
where you can restore this file on another XFS fs.

Brian

> 
> % 
> % Then you can mount/xfs_repair the restored sparse image, see what
> % xfs_repair does, mount the before/after img, etc. Note again that file
> % data is absent from the restored metadata image so don't expect to be
> % able to look at file content in the metadump image.
> 
> Right.  That sounds like a great middle step, though.  Thanks!
> 
> 
> % 
> % Brian
> 
> 
> HAND
> 
> :-D
> -- 
> David T-G
> See http://justpickone.org/davidtg/email/
> See http://justpickone.org/davidtg/tofu.txt
>