On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > > The thing to do is > > > > - untar it on some trusted machine with a local disk and a known-good > > filesystem. > > > > IOW, not that networked samba share. > > > > - verify that it really does happen on that machine, with that untarred > > image. Because maybe it doesn't. > > Unfortunately it doesn't Well, that's not necessarily "unfortunate". It does actually end up showing that the objects themselves were apparently never really corrupt. So there is no fundamental data structure corrupttion - because when you copy the repository, it's all good agin! In other words, that's not a worthless piece of information at all, and it does tell us a lot, namely that the corruption was never real long-term data corruption of the git object archive, but something local and temporary. Again, we're really back to either: - it could be some _temporary_ git corruption caused internally inside a git process - ie a wild pointer, or perhaps a race condition (but we don't really use threading in 1.6.0.4 unless you ask for it, and even then just for pack-file generation) - or it's the disk cache corruption, and the tar/untar ended up flushing it. And quite frankly, since the corruption seems to be site-specific, I really do suspect the second case. Although it's possible, of course, that it could be some compiler issue that makes _your_ binaries have issues even when nobody else sees it. > what I did notice was this when I did a `git > status` in the directory right after untarring: > tyler@grapefruit:~/jburgess_main> git status > # > # ---impressive amount of file names fly by--- > # ----snip--- > # > # Untracked files: > # (use "git add <file>..." to include in what will be > committed) > # > # artwork/ > # bt/ > # flash/ Hmm. That's actually _normal_ under some circumstances. At least with older git versions, or if your .git/index file couldn't be rewritten for some reason - your existing index file contains all the old stat information, and if git cannot (or, in the case of older git version, just will not) refresh it automatically, it will show all the files as changed, even if it's just the inode number that really changed. A _normal_ git install should have auto-refreshed the index, though. Unless the tar archive only contained the ".git" directory, and not the checkout? > tyler@grapefruit:~/jburgess_main> > > Basically, somehow Git thinks that *every* file in the repository is > deleted at this point. I went ahead and performed a `git reset --hard` > to see if the issue would manifest itself thereafter, but it did not. That would be what I'd expect if you had only tarred up .git, although then I wouldn't have expected those "Untracked files:". Hmm. Without being able to look at the archive, I'm just guessing randomly. > I did try to do a git-fsck(1), and this is what I got: > tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full > [1] 19381 segmentation fault /usr/local/bin/git fsck --full > tyler@grapefruit:~/jburgess_main> .. and that's the unrelated fsck bug that got fixed later. > > The hope is that you caught the corruption in the cache, and it > > actually got written out to the tar-file. But if it _is_ a disk cache > > (well, network cache) issue, maybe the IO required to tar everything up > > was enough to flush it, and the tar-file actually _works_ because it > > got repopulated correctly. > > When I was working through this with Jan, one of the things that we did > was move the actual object file in .git/objects, they existed so maybe I > could look into those to check? Yes. If you have any bad loose objects, if you compare them to the good objects with the same name, that's going to be interesting information. The pattern of corruption can be very telling. For example, on Linux, a disk cache corruption would usually be at 4kB block boundaries, because that's the granularity of the cache. While a bit error would be obvious etc etc. > I checked with our operations team, and contrary to my suspicion (your > NFS comment piqued my curiosity), these disks that are actually on the > machines are not NFS mounts but rather local disk arrays. Ok, that generally makes caching much simpler. What filesystem? Is there anything else that you do that is site-specific and/or slightly different? For example, a long time ago we had a bug related to CRLF conversion which would cause a use-after-free problem, and that would corrupt the data internally to git. And dobody else saw it than this one person, and it was a total mystery to everybody until we realized that he used this one feature that nobody else was using. So as you're on OS X, I assume you don't have CRLF conversion, but maybe you use some other feature that we support but nobody really actually uses. Like keyword expansion or something? Oh - that would also explain why you got all those entries in "git status" that went away when you did a "git reset --hard": if you had some keyword expansion (or CRLF) enabled in the original users "~/.gitconfig", that checkout would have had expansion/CRLF/whatever conversion, but then when you tarred/untarred it on another setup, the expansion would be seen as a difference because it wasn't enabled. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html