On Thu, 12 Jun 2008, Pierre Habouzit wrote: > > No, we're not using a shared git object repository, each developper > has a git checkout in his /home (on NFS) but works for real in a workdir > that lives on his local hard drive (to get faster compilation times, > because NFS really sucks at speed for compilation). Though, people > working on plain NFS have had the same problems. Ahhh.. In that case it's not going to be a client caching issue - at least not in the sense that two different clients are out-of-sync with each other wrt caches. It sounds as it you only ever have one client that reads and writes to the same git repository at a time. So scratch all the previous theory. Quite frankly, in that case, it sounds more like simply some NFS problem. And we _have_ had NFS problems before. See the threads - bug: git-repack -a -d produces broken pack on NFS Turned out to apparently be ethernet packet corruption that was not detected by the hardware and was due to a badly seated ethernet card! - git 1.5.3.5 error over NFS Some unexplained corruption due to problms with pread() on NFS not returning data that was previously written. for example. Basically, NFS has many serious failure cases that can go undetected, and it _could_ be that you actually have flaky NFS but never noticed it before because most tools don't care as deeply as git does (ie if a bit is flipped in some random data, a lot of tools will never notice). There are supposed to be checksums etc on the network packets that NFS uses, but: - the ethernet checksum (which is a fairly strong CRC) is sadly often not even checked by some switches and/or cards, and especially if it's a store-and-forward switch that doesn't check the CRC properly, it can end up re-sending a corrupt packet with a recomputed ethernet CRC that now matches the _corrupt_ data. Oops. - Perhaps worse, the ethernet checksum is purely a physical layer one, not an end-to-end checksum, which not only explains how a switch can re-generate a broken one, but also means that even if the ethernet card checks it properly, it doesn't actually account for any corruption that happens _afterwards_. So if there is corruption going from the card to memory (which was apparently the problem in the first git thread above), the CRC got checked earlier and the new corruption isn't found. - there _is_ an TCP/IP-level packet check, with a checksum of the IP header, and a separate checksum of UDP and TCP data. HOWEVER. All these checksums are very very weak, and to make things worse, the UDP checksum can be entirely disabled, and quite often "better" ethernet cards will do checksumming for you in hardware, which again means that it's not an end-to-end checksum, and you have the exact same failure case as with the ethernet CRC. IOW, there are safety nets in place, but they tend to be fairly easily broken under certain circumstances. Add to the above the possibility of just a kernel NFS bug (or a NFSd one), and it would really be very interesting to hear: - do the errors seem to happen more at certain clients than others? If it's a client-side problem, it really should happen more for certain kernel versions or certain hardware. - have you had any other anecdotal evidence of problems with non-git usage? Unexplained SIGSEGV's if you have binaries over NFS, for example? Strange syntax errors when compiling over NFS? I'm not discounting a git bug, but quite frankly, it really is worth checking that your network/NFS setup is solid. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html