On Wed, 11 Jun 2008, Pierre Habouzit wrote: > > Could this be the source of a problem we often meet at work ? Let me > try to describe it. The fsync() *should* make no difference unless you actually crash. So my initial reaction is no, but non-coherent client-side write caching over NFS may actually make a difference. > We work with our git repositories (storages I should say) on NFS > homes, with workdirs on a local directory (NFS homes are backuped daily, > hence everything commited get backuped, and developers have shorter > compilation times thanks to the local FS). Ok, so your actual git object directory is on NFS? > Quite often, when people commit, they have corrupt repositories. The > symptom is a `cannot read <sha1>` error message (or many at times). The > usual way to "fix" it is to git fsck, and git reset (because after the > fsck the index is totally screwed and all local files are marked new), > and usually everything is fine then. Hmm. Very interesting. That definitely sounds like a cache coherency issue (ie the "fsck" probably doesn't really _do_ anything, it just delays things and possibly causes memory pressure to throw some stuff out of the cache). What clients, what server? NFS clients (I assume v2, which is not coherent) _should_ be doing what is called open-close consistent, which means that while clients can cache data locally, they should aim to be consistent between two clients over a an open-close pair (ie if two clients have the same file open at the same time, there are no consistency guarantees, but if you close on one client and then open on another, the data should be consistent). If open-close consistency doesn't work, then things like various parallel load distribution things (clusters with a NFS filesystem doing parallel makes, etc) don't tend to work all that well either (ie an object file is written on one client, and then used for linking on another). And that is what git does: even without the fsync(), git will "close()" the file before it actually does the link + unlink to move it to the new position. So it all _should_ be perfectly consistent even in the absense of explicit syncs. That said, if there is some problem with that whole thing, then yes, the fsync() may well hide it. So yes, adding the fsync() is certainly worth testing. > This is not really a hard corruption, and it's really hard to > reproduce, I don't know why it happens, and I wonder if this patch could > help, or if it's unrelated. I can only bring speculations as it's really > hard to reproduce, and it quite depends on the load of the NFS server :/ Yes, that sounds very much like a cache coherency issue. The "corruption" goes away when the cache gets flushed and the clients see the real state again. But as mentioned, git should already do things in a way that this should all work, but hey, that's using certain assumptions that perhaps aren't true in your environment. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html