Re: git clone sending unneeded objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 26 Sep 2009, Jason Merrill wrote:

> On 09/26/2009 12:44 AM, Jason Merrill wrote:
> > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*'
> > git fetch
> 
> git count-objects -v before:
> 
> count: 44
> size: 1768
> in-pack: 1399509
> packs: 1
> size-pack: 600456
> prune-packable: 0
> garbage: 0

I'm sure if you had done 'git rev-list --all --objects | wc -l' at that 
point, the result would have been something around 900000.  That's the 
actual number of objects git had a reference to, compared to the total 
objects contained in the object store.

> and after (transferred 278MB):
> 
> count: 44
> size: 1768
> in-pack: 1947339
> packs: 2
> size-pack: 1178408
> prune-packable: 8
> garbage: 0

And those 500000 extra objects or so (minus a couple dozens which were 
probably used to "complete" the fetched thin pack and are duplicates of 
local objects -- the fetch progress message gave the exact number) were 
obtained from the remote repository because git has no way to tell the 
remote it already had them.  That's what I was explaining in my previous 
email.

> and then after git gc --prune=now:
> 
> count: 0
> size: 0
> in-pack: 1399613
> packs: 1
> size-pack: 839900
> prune-packable: 0
> garbage: 0
> 
> So I only actually needed 104 more objects, but fetch wasn't clever enough to
> see that, and my new pack is much less efficient.

Like I said, it's not that the fetch wasn't clever enough.  Rather that 
your initial clone asked for way too many objects in the first place.  
That's what my patch fixed.

Now the pack efficiency can be explained as well.  A single pack is 
always going to be more efficient than 2 packs.  Problem is when you do 
a gc, by default git does the least costly operation which consists of 
copying as much data from existing packs without extra processing.  
That means that many objects were copied from the second (newly 
received) pack although a better delta representation was most probably 
available in the other larger pack (remember that most objects from that 
second pack already existed in the first pack).  Git do select the 
second pack in preference to the other pack because it is more recent, 
and normally more recent packs contains more recent objects which is a 
good heuristic to optimizes the object enumeration.  In this case this 
didn't produce a good result, but again we're talking about a scenario 
which is bogus from the start and shouldn't be.

So if you do a 'git gc --aggressive' and let it run for a while, you 
should get back a smaller pack, possibly even much smaller than the 
original 
one.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]