Re: Huge win, compressing a window of delta runs as a unit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> Shawn put together a new version of his import utility that packs all
> of the deltas from a run into a single blob instead of one blob per
> delta. The idea is to put 10 or more deltas into each delta entry
> instead of one. The index format would map the 10 sha1's to a single
> packed delta entry which would be expanded when needed. Note that you
> probably needed multiple entries out of the delta pack to generate the
> revision you were looking for so this is no real loss on extraction.
> 
> I ran it overnight on mozcvs. If his delta pack code is correct this
> is a huge win.
> 
> One entry per delta -  845,42,0150
> Packed deltas - 295,018,474
> 65% smaller
> 
> The effect of packing the deltas is to totally eliminate many of the
> redundant zlib dictionaries.

I'm going to try to integrate this into core GIT this weekend.
My current idea is to make use of the OBJ_EXT type flag to add
an extended header field behind the length which describes the
"chunk" as being a delta chain compressed in one zlib stream.
I'm not overly concerned about saving lots of space in the header
here as it looks like we're winning a huge amount of pack space,
so the extended header will probably itself be a couple of bytes.
This keeps the shorter reserved types free for other great ideas.  :)

My primary goal of integrating it into core GIT is to take
advantage of verify-pack to check the file fast-import is producing.
Plus having support for it in sha1_file.c will make it easier to
performance test the common access routines that need to be fast,
like commit and tree walking.

My secondary goal is to get a patchset which other folks can try
on their own workloads to see if its as effective as what Jon is
seeing on the Mozilla archive.


Unfortunately I can't think of a way to make this type of pack
readable by older software.  So this could be creating a pretty
big change in the pack format, relatively speaking.  :)

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]