On 8/15/13 12:55 PM, Michael Maier wrote: > Eric Sandeen wrote: >> On 8/14/13 11:20 AM, Michael Maier wrote: >>> Dave Chinner wrote: >> >> ... >> >>>> If it makes you feel any better, the bug that caused this had been >>>> in the code for 15+ years and you are the first person I know of to >>>> have ever hit it.... >>> >>> Probably the second one :-) See >>> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428 >>> >>>> xfs_repair doesn't appear to have any checks in it to detect this >>>> situation or repair it - there are some conditions for zeroing the >>>> unused parts of a superblock, but they are focussed around detecting >>>> and correcting damage caused by a buggy Irix 6.5-beta mkfs from 15 >>>> years ago. >>> >>> The _big problem_ is: xfs_repair not just doesn't repair it, but it >>> _causes data loss_ in some situations! >>> >> >> So as far as I can tell at this point, a few things have happened to >> result in this unfortunate situation. Congratulations, you hit a >> perfect storm. :( > > I can appease you - as it "only" hit my backup device and because I > noticed the problem before I really needed it: I didn't hit any data > loss over all, because the original data is ok and I repeated the backup > w/ the fixed FS now! > >> 1) prior resize operations populated unused portions of backup sbs w/ junk >> 2) newer kernels fail to verify superblocks in this state >> 3) during your growfs under 3.10, that verification failure aborted >> backup superblock updates, leaving many unmodified >> 4a) xfs_repair doesn't find or fix the junk in the backup sbs, and >> 4b) when running, it looks for the superblocks which are "most matching" >> other superblocks on the disk, and takes that version as correct. >> >> So you had 16 superblocks (0-15) which were correct after the growfs. >> But 16 didn't verify and was aborted, so nothing was updated after that. >> This means that 16 onward have the wrong number of AGs and disk blocks; >> i.e. they are the pre-growfs size, and there are 26 of them. >> >> Today, xfs_repair sees this 26-to-16 vote, and decides that the 26 >> matching superblocks "win," rewrites the first superblock with this >> geometry, and uses that to verify the rest of the filesytem. Hence >> anything post-growfs looks out of bounds, and gets nuked. >> >> So right now, I'm thinking that the "proper geometry" heuristic should >> be adjusted, but how to do that in general, I'm not sure. Weighting >> sb 0 heavily, especially if it matches many subsequent superblocks, >> seems somewhat reasonable. > > This would have been my next question! I repaired it w/ the git > xfs_repair on the already reduced to original size FS. I think, if I > would have done the same w/ the grown FS, the FS most probably would be > reduced to the size before the growing. > > Wouldn't it be better to not grow at all if there are problems detected? > Means: Don't do the check after the growing, but before? Ok, I could > have done it myself ... . From now on, I will do it like this! well, see the next couple patches I'm about to send to the list ... ;) but a check prior wouldn't have helped you, because repair didn't detect the problem that growfs choked on. -Eric _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs