On 2/9/20 12:19 AM, John Jore wrote: > Hi all, > > Not sure if this is the appropriate forum to reports xfs_repair bugs? If wrong, please point me in the appropriate direction? This is the place. > I have a corrupted XFS volume which mounts fine, but xfs_repair is unable to repair it and volume eventually shuts down due to metadata corruption if writes are performed. what does dmesg say when it shuts down? > > Originally I used xfs_repair from CentOS 8.1.1911, but cloned latest xfs_repair from git://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git (Today, Feb 9th, reports as version 5.4.0) > > > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - 16:08:04: scanning agi unlinked lists - 64 of 64 allocation groups done > - process known inodes and perform inode discovery... > - agno = 45 > - agno = 15 > - agno = 0 > - agno = 30 > - agno = 60 > - agno = 46 > - agno = 16 > Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000 > - agno = 61 > - agno = 31 > - agno = 47 > - agno = 62 > - agno = 48 > - agno = 49 > - agno = 32 > - agno = 33 > - agno = 17 > - agno = 1 > bad magic number 0x0 on inode 18253615584 > bad version number 0x0 on inode 18253615584 > bad magic number 0x0 on inode 18253615585 > bad version number 0x0 on inode 18253615585 > bad magic number 0x0 on inode 18253615586 > ..... > bad magic number 0x0 on inode 18253615584, resetting magic number > bad version number 0x0 on inode 18253615584, resetting version number > bad magic number 0x0 on inode 18253615585, resetting magic number > bad version number 0x0 on inode 18253615585, resetting version number > bad magic number 0x0 on inode 18253615586, resetting magic number > bad version number 0x0 on inode 18253615586, resetting version number Looks like a whole chunk of inodes with at least 0 magic numbers. > .... > - agno = 16 > - agno = 17 > Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000 > - agno = 18 > - agno = 19 > ... > Phase 7 - verify and correct link counts... > - 16:10:41: verify and correct link counts - 64 of 64 allocation groups done > Metadata corruption detected at 0x433385, xfs_inode block 0x17312a3f0/0x2000 > libxfs_writebufr: write verifier failed on xfs_inode bno 0x17312a3f0/0x2000 This bit seems problematic, I guess it's unable to write the updated inode buffer, due to some corruption, which presumably is why you keep tripping over the same corruption each time. > releasing dirty buffer (bulk) to free list! > > > > Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d, -d is for repairing a filesystem while mounted. I hope you are not doing that, are you? > it never does repair the volume. > Volume is a ~12GB LV build using 4x 4TB disks in RAID 5 using a 3Ware 9690SA controller. Just to double check, are there any storage errors reported in dmesg? > Any suggestions or additional data I can provide? If you are willing to provide an xfs_metadump to me (off-list) I will see if I can reproduce it from the metadump. # xfs_metadump /dev/$WHATEVER metadump.img # bzip2 metadump.img -Eric > > John >