Re: Consolidate SHA1 object file close

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 11, 2008 at 03:17:04PM +0000, Linus Torvalds wrote:
> 
> 
> On Wed, 11 Jun 2008, Pierre Habouzit wrote:
> > 
> >   Could this be the source of a problem we often meet at work ? Let me
> > try to describe it.
> 
> The fsync() *should* make no difference unless you actually crash. So my 
> initial reaction is no, but non-coherent client-side write caching over 
> NFS may actually make a difference.

  That's what I thought as well but … one never knows ;)

> >   We work with our git repositories (storages I should say) on NFS
> > homes, with workdirs on a local directory (NFS homes are backuped daily,
> > hence everything commited get backuped, and developers have shorter
> > compilation times thanks to the local FS).
> 
> Ok, so your actual git object directory is on NFS?

  Yes.

> >   Quite often, when people commit, they have corrupt repositories. The
> > symptom is a `cannot read <sha1>` error message (or many at times). The
> > usual way to "fix" it is to git fsck, and git reset (because after the
> > fsck the index is totally screwed and all local files are marked new),
> > and usually everything is fine then.
> 
> Hmm. Very interesting. That definitely sounds like a cache coherency 
> issue (ie the "fsck" probably doesn't really _do_ anything, it just 
> delays things and possibly causes memory pressure to throw some stuff out 
> of the cache).
> 
> What clients, what server?

  Server uses NFSv3 kernel server from Debian's 2.6.18 etch (up to
date).  Clients are various Unbuntu/Debian's with at least 2.6.18
kernels, some .22 .24 and .25.  It's a really simple setup, no clusters
are involved. The server exports an ext3 over dm-crypt partition though,
but I would be surprised it matters.

> That said, if there is some problem with that whole thing, then yes, the 
> fsync() may well hide it. So yes, adding the fsync() is certainly worth 
> testing.

Okay, I'll try to make my colleagues use that to see if they still have
the issues. I work on a laptop and not NFS, so I'm not the one having
the issues, only the one having to fix them on other's machines ;P

> >   This is not really a hard corruption, and it's really hard to
> > reproduce, I don't know why it happens, and I wonder if this patch could
> > help, or if it's unrelated. I can only bring speculations as it's really
> > hard to reproduce, and it quite depends on the load of the NFS server :/
> 
> Yes, that sounds very much like a cache coherency issue. The "corruption" 
> goes away when the cache gets flushed and the clients see the real state 
> again. But as mentioned, git should already do things in a way that this 
> should all work, but hey, that's using certain assumptions that perhaps 
> aren't true in your environment.

  Well we have the issue for quite a long time actually, and given that
it's hard to reproduce, I'm never in a state to be able to give more
useful informations :/ We'll see if the fsync() helps or not…

-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgpfBD9QCoBFD.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux