Re: [RFC] Add --create-cache to repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 31 Jan 2011, Shawn Pearce wrote:

> On Fri, Jan 28, 2011 at 17:32, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> >>> >
> >>> >> This started because I was looking for a way to speed up clones coming
> >>> >> from a JGit server.  Cloning the linux-2.6 repository is painful,
> >
> > Well, scratch the idea in this thread.  I think.
> 
> Nope, I'm back in favor with this after fixing JGit's thin pack
> generation.  Here's why.
> 
> Take linux-2.6.git as of Jan 12th, with the cache root as of Dec 28th:
> 
>   $ git update-ref HEAD f878133bf022717b880d0e0995b8f91436fd605c
>   $ git-repack.sh --create-cache \
>       --cache-root=b52e2a6d6d05421dea6b6a94582126af8cd5cca2 \
>       --cache-include=v2.6.11-tree
>   $ git repack -a -d
> 
>   $ ls -lh objects/pack/
>   total 456M
>   1.4M pack-74af5edca80797736fe4de7279b2a81af98470a5.idx
>   38M pack-74af5edca80797736fe4de7279b2a81af98470a5.pack
> 
>   49M pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.idx
>   89 pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.keep
>   368M pack-d3e77c8b3045c7f54fa2fb6bbfd4dceca1e2b9fa.pack
> 
> Our "recent history" is 38M, and our "cached pack" is 368M.  Its a bit
> more disk than is strictly necessary, this should be ~380M.  Call it
> ~26M of wasted disk. 

This is fine.  When doing an incremental fetch, the thin pack does 
minimize the transfer size, but it does increase the stored pack size by 
appending a bunch of non delta objects to make the pack complete.

What happens though, is that when gc kicks in, the wasted space is 
collected back.  Here with a single pack we wouldn't claim that space 
back as our current euristics is to reuse delta (non) pairing by 
default.  Maybe in that case we could simply not reuse deltas if they're 
of the REF_DELTA type.

> The "cached object list" I proposed elsewhere in
> this thread would cost about 41M of disk and is utterly useless except
> for initial clones.  Here we are wasting about 26M of disk to have
> slightly shorter delta chains in the cached pack (otherwise known as
> our ancient history).  So its a slightly smaller waste, and we get
> some (minor) benefit.

Well, of course the ancient history you're willing to keep stable for a 
while could be repacked even more aggressively than usual.

> Using the cached pack increased our total data transfer by 2.39 MiB,

That's more than acceptable IMHO. That's less than 1% of the total 
transfer.

> I think this is worthwhile.  If we are afraid of the extra 2.39 MiB
> data transfer this forces on the client when the repository owner
> enables the feature, we should go back and improve our thin-pack code.
>  Transferring 11 MiB to catch up a kernel from Dec 28th to Jan 12th
> sounds like a lot of data, 

Well, your timing for this test corresponds with the 2.6.38 merge window 
which is a high activity peak for this repository.  Still, that would 
probably fit the usage scenario in practice pretty well where the cache 
pack would be produced on a tagged release which happens right before 
the merge window.


> and any improvements in the general
> thin-pack code would shrink the leading thin-pack, possibly getting us
> that 2.39 MiB back.

Any improvement to the thin pack would require more CPU cycles, possibly 
lot more.  So given this transfer overhead is less than 1% already I 
don't think we need to bother.


Nicolas

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]