Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote:
> Well, that's not necessarily "unfortunate". It does actually end up 
> showing that the objects themselves were apparently never really corrupt.
> 
> So there is no fundamental data structure corrupttion - because when you 
> copy the repository, it's all good agin!
>  - it could be some _temporary_ git corruption caused internally inside a 
>    git process - ie a wild pointer, or perhaps a race condition (but we 
>    don't really use threading in 1.6.0.4 unless you ask for it, and even 
>    then just for pack-file generation)

I have a feeling it's something like this, one of our operations guys
did some research while I was looking at code and he came across this:

        On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
        git-merge is using too much RAM, and failing to malloc() but
        NOT  
        > reporting it.  This is all sorts of bad:
        > 
        >   A) using an unscalable amount of RAM
        >   B) failing to detect malloc() failure
        >   C) reporting file corruption instead
        > I was able to reproduce this.
        >
        > limit ~1.5GB -> corrupt file
        > limit ~3GB -> magically no longer corrupt.
        >
        > The false fail may be limited to git-merge, but git status also  
        > allocates the same amount of RAM.
        > 
        > To temporarily work around this problem, issue this once you
        log in to  
        > a dev box:
        > 
        > tcsh:
        >         limit vmemoryuse 3000000
        > bash:
        >         ulimit -v 3000000
        > 
        > Be gentle.
        

> And quite frankly, since the corruption seems to be site-specific, I 
> really do suspect the second case. Although it's possible, of course, that 
> it could be some compiler issue that makes _your_ binaries have issues 
> even when nobody else sees it.

I think you're correct insofar that our major site-specific alteration
has come up on the mailing list before (okay maybe two site-specific
things). 
	* Our Git repo is ~7.1GB
	* ulimit -v is set to ~1.5G


I think I know how this could be failing and corrupting things (assuming
it's malloc(2)) related.


What I'm thinking is that in xmalloc() or one of the other x*)_
functions, the malloc(size) is failing because of the ulimits, and then
the potentially somewhere it's silently failing or maybe even
accidentally returning one of those "malloc(1)" pointers?

I've got two new tarred repositories from two developers the issue
happened to today, so I'm flush full of sample repositories to try stuff
on :)


> 
> Hmm. That's actually _normal_ under some circumstances. At least with 
> older git versions, or if your .git/index file couldn't be rewritten for 
> some reason - your existing index file contains all the old stat 
> information, and if git cannot (or, in the case of older git version, just 
> will not) refresh it automatically, it will show all the files as changed, 
> even if it's just the inode number that really changed.
> 
> A _normal_ git install should have auto-refreshed the index, though. 
> Unless the tar archive only contained the ".git" directory, and not the 
> checkout?

I believe the issues I noticed when untarring the repo were a red
herring, I did the `git diff` after untarring and I noticed that only a
certain set of files where changed, I'm willing to go so far as to guess
that they were the files affected in the corrupted packs. Of the 32k
files in our repository, 98 were actually different after untarring
(according to git-diff(1))

> And dobody else saw it than this one person, and it was a total mystery to 
> everybody until we realized that he used this one feature that nobody else 
> was using. So as you're on OS X, I assume you don't have CRLF conversion, 
> but maybe you use some other feature that we support but nobody really 
> actually uses. Like keyword expansion or something?

The two new folks this happened to today had nothing "special" about
them other than the ulimit.


I've got the script(1) output of performing git-ls-files(1) and some
other commands that I tried, nothing they output was particular
informative or interesting, and I don't think it will help if this
really is a memory related issue, that said I'd be more than happy to
send it to a couple of you (Junio, Linus, Nico).


I'm *so* ready for this bug to die >=\


Cheers

-- 
-R. Tyler Ballance
Slide, Inc.

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux