On Mon, Oct 25, 2010 at 8:25 AM, Mike Herrick <mike.herrick@xxxxxxxxx> wrote: > On Mon, Oct 25, 2010 at 6:58 AM, Drew Northup <drew.northup@xxxxxxxxx> wrote: >> >> On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote: >>> This weekend we're cutting over to use git for our source code control >>> system. I've imported about 20 years worth of previous history using >>> "git cvsimport" (takes about four hours). I then cloned the resulting >>> repository onto five different machines (four Linux, one Solaris). >>> I've set up a cron job to do a nightly "git fsck" on each of the five >>> machines, and last night, two of the machines reported fsck errors on >>> their initial run. >> <snip> >> >>> The errors reported on these two machines were different, but what's >>> interesting is that all of the missing blobs refer to various >>> revisions of the same file, namely our "Changes" file (which is >>> updated with each change). It's also the largest file in our >>> repository (3.3M). I immediately started looking at logs to see if >>> there was any indication of disk corruption and found none (no SMART >>> errors either). Both of these machines have been stable over a >>> multi-year period of time (no unexplained crashes). They're also >>> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with >>> relatively little memory: 1GB and .5GB), but with newly installed >>> version of git (1.7.3.1). I initially used git-daemon for the clone >>> process, but even using ssh, I still see fsck errors on the resulting >>> clones on these two machines. >> >> Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones >> showing errors? Alas, no blatant evidence of disk corruption is not >> evidence of no disk corruption as well. > > Thanks for your reply. > > Only two of the five clones exhibit fsck errors and the server > repository has no fsck errors. > > The two machines report different sets of missing blobs, but always in > the "Changes" file (which has the somewhat unique characteristics that > it is the "most changed" file in the repository, the largest, and one > which is almost always only added to). > > I've since created two more clones on one of the machines (one using > git-daemon and the other ssh) and both of these clones have the exact > same set of missing blobs! For me this rules out disk corruption. > > The good(?) news is that the process is repeatable on one machine: > cloning from a known good repository results in different (but > repeatable) errors. Performing a second clone on the other "bad" > machine also results in missing blobs, but different ones than the > first (although all in the Changes file). > > My current thought is that somehow it's related to very old kernels? > Apparently these machines are FC2 vintage. We've backed out of our git cutover due to these errors. I should also point out that on the machine where the errors are repeatable, two of the clones were made to a local disk and one to an NFS disk, and all three showed the same missing blobs (another indication that it is unlikely to be a disk problem). It's also interesting that the missing blobs seem to be in the same general timeframe, 2001-2002 on one machine and 2008-2009 on the other machine (as evidenced by the file sizes of the missing blobs): [mikeh@mac5 src]$ for i in `cat /tmp/lin4`; do git cat-file -s $i ; done 1494474 1667992 1496198 1643008 1666070 1724686 1494201 1643297 1665137 1640569 1726140 [mikeh@mac5 src]$ for i in `cat /tmp/toulouse`; do git cat-file -s $i ; done 3055178 2858902 3060252 2887177 3038051 3033691 3008232 2981567 3000575 3081501 2995707 3070232 3076036 3059223 3075351 3070343 3054573 3033120 3028284 3078443 2896078 2895094 2973070 2859356 I was hoping that these would be on some type of boundary (and hence powers of two), but that doesn't seem to be the case. Mike. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html