Re: xfs_rapair fails with err 117. Can I fix the fs or recover individual files somehow?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 23, 2023 at 6:14 PM Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>
> On 6/23/23 3:25 PM, Fernando CMK wrote:
> > Scenario
> >
> > opensuse 15.5, the fs was originally created on an earlier opensuse
> > release. The failed file system is on top of a mdadm raid 5, where
> > other xfs file systems were also created, but only this one is having
> > issues. The others are doing fine.
> >
> > xfs_repair and xfs_repair -L both fail:
>
> Full logs please, not the truncated version.

Phase 1 - find and verify superblock...
       - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
       - zero log...
       - 16:14:46: zeroing log - 128000 of 128000 blocks done
       - scan filesystem freespace and inode maps...
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0xfa00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x0/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x6d600000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x23280000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x1b580000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x27100000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x7d00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block
0x3e80000/0x1000stripe width (17591899783168) is
too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0xbb80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x13880000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x1f400000/0x1000
stripe width (17591899783168) is too largestripe width
(17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x3a980000/0x1000

stripe width (17591899783168) is too large
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x4a380000/0x1000
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x55f00000/0x1000

Metadata corruption detected at 0x55f819658658, xfs_sb block 0x2ee00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x3e800000/0x1000
stripe width (17591899783168) is too largestripe width
(17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x4e200000/0x1000

Metadata corruption detected at 0x55f819658658, xfs_sb block 0x69780000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x2af80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x61a80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x79180000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x32c80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x59d80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x65900000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x36b00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x46500000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x71480000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x52080000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x42680000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x5dc00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x17700000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x75300000/0x1000
clearing needsrepair flag and regenerating metadata
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x7d000000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x84d00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x88b80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x8ca00000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x90880000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x98580000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x9c400000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x80e80000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0xa0280000/0x1000
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658658, xfs_sb block 0x94700000/0x1000
       - 16:14:46: scanning filesystem freespace - 42 of 42 allocation
groups done
       - found root inode chunk
Phase 3 - for each AG...
       - scan and clear agi unlinked lists...
       - 16:14:46: scanning agi unlinked lists - 42 of 42 allocation groups done
       - process known inodes and perform inode discovery...
       - agno = 0
       - agno = 15
       - agno = 30
       - agno = 16
       - agno = 17
       - agno = 31
       - agno = 18
       - agno = 19
       - agno = 20
       - agno = 32
       - agno = 33
       - agno = 21
       - agno = 34
       - agno = 35
       - agno = 36
       - agno = 37
       - agno = 38
       - agno = 39
       - agno = 40
       - agno = 41
       - agno = 22
       - agno = 23
       - agno = 24
       - agno = 25
       - agno = 26
       - agno = 27
       - agno = 28
       - agno = 29
       - agno = 1
       - agno = 2
       - agno = 3
       - agno = 4
       - agno = 5
       - agno = 6
       - agno = 7
       - agno = 8
       - agno = 9
       - agno = 10
       - agno = 11
       - agno = 12
       - agno = 13
       - agno = 14
       - 16:15:10: process known inodes and inode discovery - 788480
of 788480 inodes done
       - process newly discovered inodes...
       - 16:15:10: process newly discovered inodes - 42 of 42
allocation groups done
Phase 4 - check for duplicate blocks...
       - setting up duplicate extent list...
       - 16:15:10: setting up duplicate extent list - 42 of 42
allocation groups done
       - check for inodes claiming duplicate blocks...
       - agno = 0
       - agno = 5
       - agno = 2
       - agno = 3
       - agno = 8
       - agno = 4
       - agno = 9
       - agno = 10
       - agno = 7
       - agno = 6
       - agno = 11
       - agno = 1
       - agno = 12
       - agno = 13
       - agno = 15
       - agno = 14
       - agno = 16
       - agno = 17
       - agno = 18
       - agno = 19
       - agno = 20
       - agno = 21
       - agno = 22
       - agno = 23
       - agno = 24
       - agno = 25
       - agno = 26
       - agno = 27
       - agno = 28
       - agno = 29
       - agno = 30
       - agno = 31
       - agno = 32
       - agno = 33
       - agno = 34
       - agno = 35
       - agno = 36
       - agno = 37
       - agno = 38
       - agno = 39
       - agno = 40
       - agno = 41
       - 16:15:10: check for inodes claiming duplicate blocks - 788480
of 788480 inodes done
Phase 5 - rebuild AG headers and trees...
       - 16:15:19: rebuild AG headers and trees - 42 of 42 allocation
groups done
       - reset superblock...
Phase 6 - check inode connectivity...
       - resetting contents of realtime bitmap and summary inodes
       - traversing filesystem ...
       - traversal finished ...
       - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
       - 16:15:34: verify and correct link counts - 42 of 42
allocation groups done
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658468, xfs_sb block 0x0/0x1000
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x8
stripe width (17591899783168) is too large
Metadata corruption detected at 0x55f819658468, xfs_sb block 0x0/0x1000
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run
xfs_repair.



>
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify and correct link counts...
> >         - 16:15:34: verify and correct link counts - 42 of 42
> > allocation groups done
> > stripe width (17591899783168) is too large
> > Metadata corruption detected at 0x55f819658468, xfs_sb block 0x0/0x1000
> > libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x8
> > stripe width (17591899783168) is too large
>
> 0xFFFEEF00000 - that's suspicious. No idea how the stripe unit could
> have been set to something so big.
>
> > Metadata corruption detected at 0x55f819658468, xfs_sb block 0x0/0x1000
> > libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x8
> > xfs_repair: Releasing dirty buffer to free list!
> > xfs_repair: Refusing to write a corrupt buffer to the data device!
> > xfs_repair: Lost a write to the data device!
> >
> > fatal error -- File system metadata writeout failed, err=117.  Re-run
> > xfs_repair.
> >
> > I ran xfs_repair multiple times, but I always get the same error.
>
> First, what version of xfs_repair are you using? xfs_Repair -V.
> Latest is roughly the latest kernel, 6.x.
>
> > Is there any way to fix the above?
> >
> > I tried xfs_db on an image file I created from the file system, and I
> > can  see individual paths  and file "good":
>
> > xfs_db> path /certainpath
> > xfs_db> ls
> > 10         1550204032         directory      0x0000002e   1 . (good)
> > 12         1024               directory      0x0000172e   2 .. (good)
> > 25         1613125696         directory      0x99994f93  13 .AfterShotPro (good)
> >
> >
> > Is there a way to extract files from the file system image without
> > mounting the fs ? Or is there a way to mount the file system
> > regardless of its state?
>
> mount -o ro,norecovery should get you something ...


nope :(

# mount ./disk-dump  -t xfs -o ro,norecovery /mnt
mount: /mnt: mount(2) system call failed: Structure needs cleaning.

# xfs_repair -V
xfs_repair version 5.13.0

kernel version:

5.14.21-150500.53-default #1 SMP PREEMPT_DYNAMIC Wed May 10 07:56:26
UTC 2023 (b630043) x86_64 x86_6
4 x86_64 GNU/Linux

>
> > Trying a regular mount, with or withour -o norecovery, I get:
> > mount: /mnt: mount(2) system call failed: Structure needs cleaning.
>
> ... oh. And what did the kernel dmesg say when that happened?

dmesg:

[ 1565.659025] XFS (loop6): SB validate failed with error -117.
[ 1590.584851] loop6: detected capacity change from 0 to 2726297600
[ 1590.585544] XFS (loop6): stripe width (17591899783168) is too large
[ 1590.585555] XFS (loop6): Metadata corruption detected at
xfs_sb_read_verify+0xf6/0x160 [xfs], xfs_sb block
0xffffffffffffffff
[ 1590.585787] XFS (loop6): Unmount and run xfs_repair
[ 1590.585803] XFS (loop6): First 128 bytes of corrupted metadata buffer:
[ 1590.585819] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 14 50 00
00  XFSB.........P..
[ 1590.585838] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00  ................
[ 1590.585854] 00000020: 25 eb 8c c0 aa ad 4e d8 88 92 2b 42 d8 a2 be
c3  %.....N...+B....
[ 1590.585868] 00000030: 00 00 00 00 08 00 00 08 00 00 00 00 00 00 04
00  ................
[ 1590.585882] 00000040: 00 00 00 00 00 00 04 01 00 00 00 00 00 00 04
02  ................
[ 1590.585896] 00000050: 00 00 00 01 00 7d 00 00 00 00 00 2a 00 00 00
00  .....}.....*....
[ 1590.585911] 00000060: 00 01 f4 00 bd a5 10 00 02 00 00 08 00 00 00
00  ................
[ 1590.585925] 00000070: 00 00 00 00 00 00 00 00 0c 0c 09 03 17 00 00
19  ................
[ 1590.585951] XFS (loop6): SB validate failed with error -117.

>
> What happened in between this filesystem being ok, and not being ok?
> What was the first sign of trouble?

Did an openSuSE dist update from 15.3 to 15.4. Then dist up'd to 15.5
where I'm at now. At 15.4 boot it was broken, had to log in in
maintenance mode and comment out mounting the file system in fstab.


>
> If you want to provide an xfs_metadump (compressed, on gdrive or
> something, you can email me off-list) I can take a look.

Let me see if I can do that.

>
> -Eric
>
> >
> >
> >
> >
> > Regards.
> >
>




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux