On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote: > Well, that's not necessarily "unfortunate". It does actually end up > showing that the objects themselves were apparently never really corrupt. > > So there is no fundamental data structure corrupttion - because when you > copy the repository, it's all good agin! > - it could be some _temporary_ git corruption caused internally inside a > git process - ie a wild pointer, or perhaps a race condition (but we > don't really use threading in 1.6.0.4 unless you ask for it, and even > then just for pack-file generation) I have a feeling it's something like this, one of our operations guys did some research while I was looking at code and he came across this: On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote: git-merge is using too much RAM, and failing to malloc() but NOT > reporting it. This is all sorts of bad: > > A) using an unscalable amount of RAM > B) failing to detect malloc() failure > C) reporting file corruption instead > I was able to reproduce this. > > limit ~1.5GB -> corrupt file > limit ~3GB -> magically no longer corrupt. > > The false fail may be limited to git-merge, but git status also > allocates the same amount of RAM. > > To temporarily work around this problem, issue this once you log in to > a dev box: > > tcsh: > limit vmemoryuse 3000000 > bash: > ulimit -v 3000000 > > Be gentle. > And quite frankly, since the corruption seems to be site-specific, I > really do suspect the second case. Although it's possible, of course, that > it could be some compiler issue that makes _your_ binaries have issues > even when nobody else sees it. I think you're correct insofar that our major site-specific alteration has come up on the mailing list before (okay maybe two site-specific things). * Our Git repo is ~7.1GB * ulimit -v is set to ~1.5G I think I know how this could be failing and corrupting things (assuming it's malloc(2)) related. What I'm thinking is that in xmalloc() or one of the other x*)_ functions, the malloc(size) is failing because of the ulimits, and then the potentially somewhere it's silently failing or maybe even accidentally returning one of those "malloc(1)" pointers? I've got two new tarred repositories from two developers the issue happened to today, so I'm flush full of sample repositories to try stuff on :) > > Hmm. That's actually _normal_ under some circumstances. At least with > older git versions, or if your .git/index file couldn't be rewritten for > some reason - your existing index file contains all the old stat > information, and if git cannot (or, in the case of older git version, just > will not) refresh it automatically, it will show all the files as changed, > even if it's just the inode number that really changed. > > A _normal_ git install should have auto-refreshed the index, though. > Unless the tar archive only contained the ".git" directory, and not the > checkout? I believe the issues I noticed when untarring the repo were a red herring, I did the `git diff` after untarring and I noticed that only a certain set of files where changed, I'm willing to go so far as to guess that they were the files affected in the corrupted packs. Of the 32k files in our repository, 98 were actually different after untarring (according to git-diff(1)) > And dobody else saw it than this one person, and it was a total mystery to > everybody until we realized that he used this one feature that nobody else > was using. So as you're on OS X, I assume you don't have CRLF conversion, > but maybe you use some other feature that we support but nobody really > actually uses. Like keyword expansion or something? The two new folks this happened to today had nothing "special" about them other than the ulimit. I've got the script(1) output of performing git-ls-files(1) and some other commands that I tried, nothing they output was particular informative or interesting, and I don't think it will help if this really is a memory related issue, that said I'd be more than happy to send it to a couple of you (Junio, Linus, Nico). I'm *so* ready for this bug to die >=\ Cheers -- -R. Tyler Ballance Slide, Inc.
Attachment:
signature.asc
Description: This is a digitally signed message part