Re: large(25G) repository in git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre wrote:
> Because it is way more complex for git to do that than for ssh to keep 
> the connection alive.  And normally there is no need as git is supposed 
> to be faster than that.

Sure, I'll buy that.

>>>> So, to work around that, I ran git gc.  When done, I discovered that
>>>> git repacked the *entire* repository.  While not something I care for,
>>>> I can understand that, and live with it.  It just took *hours* to do so.
>>>>
>>>> Then, what really annoys me, is that when I finally did the push, it
>>>> tried sending the single 27G pack file, when the remote already had
>>>> 25G of the repository in several different packs(the site was an
>>>> hg->git conversion).  This part is just unacceptable.
>>> This shouldn't happen either.  When pushing, git reconstruct a pack with 
>>> only the necessary objects to transmit.  Are you sure it was really 
>>> trying to send a 27G pack?
>> Of course I'm sure.  I wouldn't have sent the email if it didn't
>> happen.  And, I have the bandwidthd graph and lost time to prove it.
> 
> As much as I would like to believe you, this doesn't help fixing the 
> problem if you don't provide more information about this.  For example, 
> the output from git during the whole operation might give us the 
> beginning of a clue.  Otherwise, all I can tell you is that such thing 
> is not supposed to happen.

First off, you've put a bad tone on this.  It appears that you are
saying I'm mistaken, and it didn't send all that data.  "It can't
happen, so it didn't happen."  Believe me, if it hadn't resent all
this data, I wouldn't have even sent the email.

In any event, we got lucky.  I *do* have a log of the push side of
this problem.  I doubt it's enough to figure out the actual cause tho.

==
ofbiz@lnxwww10:/job/@anon-site@> git push bf-yum
Counting objects: 96637, done.

Compressing objects:   6% (2413/34478)   478)
Read from remote host @anon-site-dev@.brainfood.com: Connection reset
by peer
Compressing objects:  27% (9458/34478)

Compressing objects: 100% (34478/34478), done.
error: pack-objects died with strange error
error: failed to push some refs to 'ssh://bf-yum/@anon-site@'
ofbiz@lnxwww10:/job/@anon-site@>
ofbiz@lnxwww10:/job/@anon-site@>
ofbiz@lnxwww10:/job/@anon-site@>
ofbiz@lnxwww10:/job/@anon-site@> git push bf-yum
Counting objects: 96637, done.
Killed by signal 2.:   5% (1866/34478)

ofbiz@lnxwww10:/job/@anon-site@> git gc
Counting objects: 96637, done.
Compressing objects:  27% (9453/34478)

Compressing objects: 100% (34478/34478), done.
Writing objects: 100% (96637/96637), done.
Total 96637 (delta 48713), reused 88929 (delta 43905)
Removing duplicate objects: 100% (256/256), done.
ofbiz@lnxwww10:/job/@anon-site@>
ofbiz@lnxwww10:/job/@anon-site@>
ofbiz@lnxwww10:/job/@anon-site@> du .git -sc
26797788        .git
26797788        total
ofbiz@lnxwww10:/job/@anon-site@> git push bf-yum
Counting objects: 96637, done.
Compressing objects: 100% (29670/29670), done.
Writing objects: 100% (96637/96637), 25.49 GiB | 226 KiB/s, done.
Total 96637 (delta 48713), reused 96637 (delta 48713)
To ssh://bf-yum/@anon-site@
 * [new branch]      master -> lnxwww10
==
ofbiz@lnxwww10:/job/@anon-site@> ls .git/objects/pack/ -l
total 26762436
-r--r--r-- 1 ofbiz users     3452052 2009-03-21 23:11
pack-0d7b399006ae0a57ff3df07fdcaedbaeb7e63d0a.idx
-r--r--r-- 1 ofbiz users 27374508409 2009-03-21 23:11
pack-0d7b399006ae0a57ff3df07fdcaedbaeb7e63d0a.pack
==

I have a bf-yum remote defined, that pushes to the remote branch; once
it gets there, I then do a merge on the target machine.

The 'killed by signal 2' is when I ctrl-c.

The second group was done from another window.  There's only a single
pack file now.

The @anon-site@ stuff is me removing client identifiers.  It's the
only editting I did to the screen log.

> 
>> After I ran git push, ssh timed out, the temp pack that was created
>> was then removed, as git complained about the connection being gone.
> 
> On a push, there is no creation of a temp pack.  It is always produced 
> on the fly and pushed straight via the ssh connection.

No.  I saw a temp file in strace.  It *was* created on the local disk,
and *not* sent on the fly.

>> I then decided to do a 'git gc', which collapsed all the separate
>> packs into one.  This allowed git push to proceed quickly, but at that
>> point, it started sending the entire pack.
> 
> If this was really the case, then this is definitely a bug.  Please take 
> a snapshot of your screen with git messages if this ever happens again.

See above.

> 
>> It's entirely possible that the temp pack created by git push was
>> incremental; it just took too long to create it, so it got aborted.
> 
> The push operation has multiple phases.  You should see "counting 
> objects", "compressing objects" and "writing objects".  Could you give 
> us an approximation of how long each of those phases took?

Well, counting was quick enough.  compression took at *least* 2 hours,
might have been 4 or more.  This all started friday evening.  I was
watching it a bit at the beginning, but then went out, and it died
after I got back to it.

>> I forgot to mention previously, that the source machine was running
>> git 1.5.6.5, and was pushing to 1.5.6.3.
>>
>> I've tried duplicating this problem on a machine with 1.6.1.3, but
>> either I don't fully understand the issue enough to replicate it, or
>> the newer git doesn't have the problem.
> 
> That's possible.  Maybe others on the list might recall possible issues 
> related to this that might have been fixed during that time.

Well, I looked at the release notes between all these versions.
Nothing stands out, but I'm aware that the changelog/release note
entry for some change doesn't always describe the actual bug that
caused the change to occur.

>> Um, if it's missing documentation, then how am I supposed to know
>> about it?
> 
> Asking on the list, like you did.  However this attribute should be 
> documented as well of course.  I even think that someone posted a patch 
> for it a while ago which might have been dropped.

What I'd like, is a way to say a certain pattern of files should only
be deduped, and not deltafied.  This would handle the case of exact
copies, or renames, which would still be a win for us, but generally
when a new video(or doc or pdf) is uploaded, it's alot of work to try
and deltafy, for very little benefit.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux