On Sat, 26 Sep 2009, Jason Merrill wrote: > On 09/26/2009 12:44 AM, Jason Merrill wrote: > > git config remote.origin.fetch 'refs/remotes/*:refs/remotes/origin/*' > > git fetch > > git count-objects -v before: > > count: 44 > size: 1768 > in-pack: 1399509 > packs: 1 > size-pack: 600456 > prune-packable: 0 > garbage: 0 I'm sure if you had done 'git rev-list --all --objects | wc -l' at that point, the result would have been something around 900000. That's the actual number of objects git had a reference to, compared to the total objects contained in the object store. > and after (transferred 278MB): > > count: 44 > size: 1768 > in-pack: 1947339 > packs: 2 > size-pack: 1178408 > prune-packable: 8 > garbage: 0 And those 500000 extra objects or so (minus a couple dozens which were probably used to "complete" the fetched thin pack and are duplicates of local objects -- the fetch progress message gave the exact number) were obtained from the remote repository because git has no way to tell the remote it already had them. That's what I was explaining in my previous email. > and then after git gc --prune=now: > > count: 0 > size: 0 > in-pack: 1399613 > packs: 1 > size-pack: 839900 > prune-packable: 0 > garbage: 0 > > So I only actually needed 104 more objects, but fetch wasn't clever enough to > see that, and my new pack is much less efficient. Like I said, it's not that the fetch wasn't clever enough. Rather that your initial clone asked for way too many objects in the first place. That's what my patch fixed. Now the pack efficiency can be explained as well. A single pack is always going to be more efficient than 2 packs. Problem is when you do a gc, by default git does the least costly operation which consists of copying as much data from existing packs without extra processing. That means that many objects were copied from the second (newly received) pack although a better delta representation was most probably available in the other larger pack (remember that most objects from that second pack already existed in the first pack). Git do select the second pack in preference to the other pack because it is more recent, and normally more recent packs contains more recent objects which is a good heuristic to optimizes the object enumeration. In this case this didn't produce a good result, but again we're talking about a scenario which is bogus from the start and shouldn't be. So if you do a 'git gc --aggressive' and let it run for a while, you should get back a smaller pack, possibly even much smaller than the original one. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html