On 8/2/16 8:20 AM, Jeff Mahoney wrote: > Hi all - > > While investigating a weird report on an internal list I found a few old > commits that don't look quite right and that may be very old bugs. I > know it's hard to go back nearly 15 years, especially in the days where > very short commit messages were still acceptable, and try to remember > why certain changes happened. In this case, a weird corner case[1] > would've been caught, xfs_repair would've bailed, and a file system may > have survived (despite obvious user error). > > 1/ Commit 5000d01d212f (white space cleanup) Meh, not a white space cleanup, is it! > diff --git a/libxlog/util.c b/libxlog/util.c > index 7aca165..aa3093d 100644 > --- a/libxlog/util.c > +++ b/libxlog/util.c > @@ -49,8 +49,10 @@ header_check_uuid(xfs_mount_t *mp, xlog_rec_header_t > *head) > printf("* ERROR: mismatched uuid in log\n" > "* SB : %s\n* log: %s\n", > uu_sb, uu_log); > + > + memcpy(&mp->m_sb.sb_uuid, head->h_fs_uuid, sizeof(uuid_t)); > > - return 1; > + return 0; > } However, after seeing the mismatch, it "fixes" it by copying the header uuid into the mount point uuid. But that doesn't seem like the right approach at all, and it renders all the callers who check the return value of header_check_uuid pointless. So yeah, doesn't look good to me. > This one may well have been intended as a repair operation or perhaps it > was accidentally duplicated from another chunk in the patch. At any > rate, it hides a mismatched UUID between the log and the superblock from > the rest of xfs_repair. The user sees the "error" message but it > carries on anyway. > > 2) Commit d321ceac8da (add libxlog directory.) > > I believe this was supposed to be as simple as pushing some > functionality from logprint into a new libxlog library, but the result > is that things that used to return an error no longer did. The > print_exit global that was initialized to 1 in logprint is initialized > to 0 in libxlog and never set. So we always print an error message but > then carry on. So even if the header_check_uuid() call above would fail > properly, the error is printed and then the error condition ignored. Sigh, yeah, the old commits are wild west. :( In this case header_check_uuid returned 0 anyway,so print_exit would not have helped, but I think you're right. A perfect storm of derp. ;) > -Jeff > > [1] The details are still murky, but what I got was that the user ran > xfs_repair -L (yup, i know) on an image file that contained partitions. > It found a valid XFS superblock and then "repaired" the file system to > an empty state since everything it found was "corrupt." I suspect that > there was an XFS file system on the raw image file, which was then > partitioned without clearing the MBR, and the expected XFS file system > was created on the first partition. xfs_repair was pointed at the whole > image file, discovered the old superblock, and remade the fs in its own > image since nothing was at the proper locations. Oh, so it found 4 matching, old, valid, superblocks? Ugh. I don't know how to protect against that, although <handwave> it probably should have found a few non-matching supers along the way as well. I wonder if we should be more cautious in that case. I could imagine that maybe for each candidate super we find, we should look at its geometry, and spot-check the other locations that it indicates should contain a superblock. If we get enough semi-valid but conflicting "sets," maybe we should bail out and ask. It's quite a corner case, tho. Any chance you have full xfs_repair output? -Eric
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs