[Compendium reply to all 3 emails] On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote: > > Hello. > > I have a fs (metadump of it > http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz) > that xfs_repair 3.2.0 is unable to fix properly. > > Running xfs_repair few times shows the same errors repeating: > http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt > http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt > http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt > http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt > > (repair1.txt also exists - it was initial, very big/long repair) > > Note that fs mounts fine (and was mounting fine before and after repair) but > xfs_repair indicates that not everything got fixed. > > > Unfortunately there looks to be a problem with metadump image. xfs_repair is > able to finish fixing on a restored image but is not able (see repairX.txt) > above on real devices. Huh? > > Examples of problems repeating each time xfs_repair is run: > > 1) > reset bad sb for ag 5 >. non-null group quota inode field in superblock 7 OK, so this is indicative of something screwed up a long time ago. Firstly, the primary superblocks shows: uquotino = 4077961 gquotino = 0 qflags = 0 i.e. user quota @ inode 4077961, no group quota. The secondary superblocks that are being warned about show: uquotino = 0 gquotino = 4077962 qflags = 0 Which is clearly wrong. They should have been overwritten during the growfs operation to match the primary superblock. The similarity in inode number leads me to beleive at some point both user and group/project quotas were enabled on this filesystem, but right now only user quotas are enabled. It's only AGs 1-15 that show this, so this seems to me that it is likely that this filesystem was originally only 16 AGs and it's been grown many times since? Oh, this all occurred because you had a growfs operation on 3.10 fail because of garbage in the the sb of AG 16 (i.e. this from IRC: http://sprunge.us/UJFE)? IOWs, this commit: 9802182 xfs: verify superblocks as they are read from disk tripped up on sb 16. That means sb 16 is was not modified by the growfs operation, and so should have the pre-growfs information in it: uquotino = 4077961 gquotino = 4077962 qflags = 0x77 Yeah, that's what I thought - the previous grow operation had both quotas enabled. OK, that explains why the growfs operation had issues, but it doesn't explain exactly how the quota inodes got screwed up like that. Anyway, the growfs issues were solved by: 10e6e65 xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields which landed in 3.13. > 2) > correcting nblocks for inode 965195858, was 19 - counted 20 > correcting nextents for inode 965195858, was 16 - counted 17 Which is preceeded by: data fork in ino 965195858 claims free block 60323539 data fork in ino 965195858 claims free block 60323532 and when combined with the later: entry "dsc0945153ac18d4d4f1a-150x150.jpg" (ino 967349800) in dir 965195858 is a duplicate name, marking entry to be junked errors from that directory, it looks like the space was freed but the directory btree not correctly updated. No idea what might have caused that, but it is a classic symptom of volatile write caches... Hmmm, and when It goes to junk them on my local testing: rebuilding directory inode 965195858 name create failed in ino 965195858 (117), filesystem may be out of space Which is an EFSCORRUPTED error trying to rebuild that directory. The second error pass did not throw an error, but it did not fix the errors as a 3rd pass still reported this. I'll look into why. > 3) clearing some entries; moving to lost+found (the same files) > > 4) > Phase 7 - verify and correct link counts... > Invalid inode number 0xfeffffffffffffff > xfs_dir_ino_validate: XFS_ERROR_REPORT > Metadata corruption detected at block 0x11fbb698/0x1000 > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > Invalid inode number 0xfeffffffffffffff > xfs_dir_ino_validate: XFS_ERROR_REPORT > Metadata corruption detected at block 0x11fbb698/0x1000 > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > done Not sure what that is yet, but it looks like writing a directory block found entries with invalid inode numbers in it. i.e. it's telling me that there's something not been fixed up. I'm actually seeing this in phase4: - agno = 148 Invalid inode number 0xfeffffffffffffff xfs_dir_ino_validate: XFS_ERROR_REPORT Metadata corruption detected at block 0x11fbb698/0x1000 libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 Second time around, this does not happen, so the error has been corrected in a later phase of the first pass. > 5)Metadata CRC error detected at block 0x0/0x200 > but it is not CRC enabled fs That's typically caused by junk in the superblock beyond the end of the v4 superblock structure. It should be followed by "zeroing junk ..." > Made xfs metadump without file obfuscation and I'm able to reproduce the > problem reliably on the image (if some xfs developer wants metadump image then > please mail me - I don't want to put it for everyone due to obvious reasons). > > So additional bug in xfs_metadump where file obfuscation "fixes" some issues. > Does it obfuscate but keep invalid conditions (like keeping "/" in file name) > ? I guess it is not doing that. I doubt it handles a "/" in a file name properly - that's rather illegal, and the obfuscation code probably doesn't handle it at all. FWIW, xfs_repair will trash those files anyway: entry at block 22 offset 560 in directory inode 419558142 has illegal name "/_198.jpg": clearing entry So regardless of whether metadump handles them or is not going to change the fact that filenames with "/" them are broken.... But the real question here is how did you get "/" characters in filenames? > [3571367.717167] XFS (loop0): Mounting Filesystem > [3571367.883958] XFS (loop0): Ending clean mount > [3571367.900733] XFS (loop0): Failed to initialize disk quotas. > > Files are accessible etc. Just no quota. Unfortunately no information why > initialization failed. I can't tell why that's happening yet. I'm not sure what the correct state is supposed to be yet (mount options will tell me) so I'm not sure what went wrong. As it is, you probaby should be upgrading to a more recent kernel.... > So xfs_repair wasn't able to fix that, too. xfs_repair isn't detecting there is a problem because the uquotino is not corrupt and the qflags is zero. Hence it doesn't do anything. More as I find it. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs