On Mon, 31 Jan 2011, Shawn Pearce wrote: > On Fri, Jan 28, 2011 at 17:32, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > >>> > > >>> >> This started because I was looking for a way to speed up clones coming > >>> >> from a JGit server. Cloning the linux-2.6 repository is painful, > > > > Well, scratch the idea in this thread. I think. > > Nope, I'm back in favor with this after fixing JGit's thin pack > generation. Here's why. > > Take linux-2.6.git as of Jan 12th, with the cache root as of Dec 28th: > > $ git update-ref HEAD f878133bf022717b880d0e0995b8f91436fd605c > $ git-repack.sh --create-cache \ > --cache-root=b52e2a6d6d05421dea6b6a94582126af8cd5cca2 \ > --cache-include=v2.6.11-tree > $ git repack -a -d > > $ ls -lh objects/pack/ > total 456M > 1.4M pack-74af5edca80797736fe4de7279b2a81af98470a5.idx > 38M pack-74af5edca80797736fe4de7279b2a81af98470a5.pack > > 49M pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.idx > 89 pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.keep > 368M pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.pack > > Our "recent history" is 38M, and our "cached pack" is 368M. Its a bit > more disk than is strictly necessary, this should be ~380M. Call it > ~26M of wasted disk. This is fine. When doing an incremental fetch, the thin pack does minimize the transfer size, but it does increase the stored pack size by appending a bunch of non delta objects to make the pack complete. What happens though, is that when gc kicks in, the wasted space is collected back. Here with a single pack we wouldn't claim that space back as our current euristics is to reuse delta (non) pairing by default. Maybe in that case we could simply not reuse deltas if they're of the REF_DELTA type. > The "cached object list" I proposed elsewhere in > this thread would cost about 41M of disk and is utterly useless except > for initial clones. Here we are wasting about 26M of disk to have > slightly shorter delta chains in the cached pack (otherwise known as > our ancient history). So its a slightly smaller waste, and we get > some (minor) benefit. Well, of course the ancient history you're willing to keep stable for a while could be repacked even more aggressively than usual. > Using the cached pack increased our total data transfer by 2.39 MiB, That's more than acceptable IMHO. That's less than 1% of the total transfer. > I think this is worthwhile. If we are afraid of the extra 2.39 MiB > data transfer this forces on the client when the repository owner > enables the feature, we should go back and improve our thin-pack code. > Transferring 11 MiB to catch up a kernel from Dec 28th to Jan 12th > sounds like a lot of data, Well, your timing for this test corresponds with the 2.6.38 merge window which is a high activity peak for this repository. Still, that would probably fit the usage scenario in practice pretty well where the cache pack would be produced on a tagged release which happens right before the merge window. > and any improvements in the general > thin-pack code would shrink the leading thin-pack, possibly getting us > that 2.39 MiB back. Any improvement to the thin pack would require more CPU cycles, possibly lot more. So given this transfer overhead is less than 1% already I don't think we need to bother. Nicolas