On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote: > On Monday 30 of June 2014, Dave Chinner wrote: > > [Compendium reply to all 3 emails] > > > > On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote: > > > reset bad sb for ag 5 > > > > > >. non-null group quota inode field in superblock 7 > > > > OK, so this is indicative of something screwed up a long time ago. > > Firstly, the primary superblocks shows: > > > > uquotino = 4077961 > > gquotino = 0 > > qflags = 0 > > > > i.e. user quota @ inode 4077961, no group quota. The secondary > > superblocks that are being warned about show: > > > > uquotino = 0 > > gquotino = 4077962 > > qflags = 0 > > > > Which is clearly wrong. They should have been overwritten during the > > growfs operation to match the primary superblock. > > > > The similarity in inode number leads me to beleive at some point > > both user and group/project quotas were enabled on this filesystem, > > Both user and project quotas were enabled on this fs for last few years. > > > but right now only user quotas are enabled. It's only AGs 1-15 that > > show this, so this seems to me that it is likely that this > > filesystem was originally only 16 AGs and it's been grown many times > > since? > > The quotas was running fine until some repair run (ie. before and after first > repair mounting with quota succeeded) - some xfs_repair run later broke this. Actually, it looks more likely that a quotacheck has failed part way though, leaving the quota in an indeterminate state and then repair has been run, messing things up more... > > > Invalid inode number 0xfeffffffffffffff > > > xfs_dir_ino_validate: XFS_ERROR_REPORT > > > Metadata corruption detected at block 0x11fbb698/0x1000 > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > > > done > > > > Not sure what that is yet, but it looks like writing a directory > > block found entries with invalid inode numbers in it. i.e. it's > > telling me that there's something not been fixed up. > > > > I'm actually seeing this in phase4: > > > > - agno = 148 > > Invalid inode number 0xfeffffffffffffff > > xfs_dir_ino_validate: XFS_ERROR_REPORT > > Metadata corruption detected at block 0x11fbb698/0x1000 > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > > > > Second time around, this does not happen, so the error has been > > corrected in a later phase of the first pass. > > Here on two runs I got exactly the same report: > > Phase 7 - verify and correct link counts... > > Invalid inode number 0xfeffffffffffffff > xfs_dir_ino_validate: XFS_ERROR_REPORT > Metadata corruption detected at block 0x11fbb698/0x1000 > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > Invalid inode number 0xfeffffffffffffff > xfs_dir_ino_validate: XFS_ERROR_REPORT > Metadata corruption detected at block 0x11fbb698/0x1000 > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000 > > but there were more of errors like this earlier so repair fixed some but left > with these two. Right, I suspect that I've got a partial fix for this already in place - i was having xfs_repair -n ... SEGV on when parsing the broken directory in phase 6, so I have some code that prevents that crash which might also be partially fixing this. > > > 5)Metadata CRC error detected at block 0x0/0x200 > > > but it is not CRC enabled fs > > > > That's typically caused by junk in the superblock beyond the end > > of the v4 superblock structure. It should be followed by "zeroing > > junk ..." > > Shouldn't repair fix superblocks when noticing v4 fs? It does. > I mean 3.2.0 repair reports: > > $ xfs_repair -v ./1t-image > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > - block cache size set to 748144 entries > Phase 2 - using internal log > - zero log... > zero_log: head block 2 tail block 2 > - scan filesystem freespace and inode maps... > Metadata CRC error detected at block 0x0/0x200 > zeroing unused portion of primary superblock (AG #0) > - 07:20:11: scanning filesystem freespace - 391 of 391 allocation > groups done > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation groups > done > - process known inodes and perform inode discovery... > - agno = 0 > [...] > > but if I run 3.1.11 after running 3.2.0 then superblocks get fixed: > > $ ./xfsprogs/repair/xfs_repair -v ./1t-image > Phase 1 - find and verify superblock... > - block cache size set to 748144 entries > Phase 2 - using internal log > - zero log... > zero_log: head block 2 tail block 2 > - scan filesystem freespace and inode maps... > zeroing unused portion of primary superblock (AG #0) ,,, > Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ? I'm pretty sure that's indicative of older xfs_repair code not understanding that sb_badfeatures2 didn't need to be zeroed. It wasn't until: cbd7508 xfs_repair: zero out unused parts of superblocks that xfs_repair correctly sized the unused area of the superblock. You'll probably find that mounting this filesystem resulted in ""sb_badfeatures2 mistach detected. Correcting." or something similar in dmesg because of this (now fixed) repair bug. > > > Made xfs metadump without file obfuscation and I'm able to reproduce the > > > problem reliably on the image (if some xfs developer wants metadump image > > > then please mail me - I don't want to put it for everyone due to obvious > > > reasons). > > > > > > So additional bug in xfs_metadump where file obfuscation "fixes" some > > > issues. Does it obfuscate but keep invalid conditions (like keeping "/" > > > in file name) ? I guess it is not doing that. > > > > I doubt it handles a "/" in a file name properly - that's rather > > illegal, and the obfuscation code probably doesn't handle it at all. > > Would be nice to keep these bad conditions. obfuscated metadump is behaving > differently than non-obfuscated metadump with xfs_repair here (less issues > with obfuscated than non-obfuscated), so obfuscation simply hides problems. Sure, but we didn't even know this was a problem until now, so that will have to wait.... > I assume that you do testing on the non-obfuscated dump I gave on irc? Yes, but I've been cross checking against the obfuscated one with xfs_db.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs