Re: Huge win, compressing a window of delta runs as a unit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 17 Aug 2006, Shawn Pearce wrote:

> I'm going to try to integrate this into core GIT this weekend.
> My current idea is to make use of the OBJ_EXT type flag to add
> an extended header field behind the length which describes the
> "chunk" as being a delta chain compressed in one zlib stream.
> I'm not overly concerned about saving lots of space in the header
> here as it looks like we're winning a huge amount of pack space,
> so the extended header will probably itself be a couple of bytes.
> This keeps the shorter reserved types free for other great ideas.  :)

We're streaving for optimal data storage here so don't be afraid to use 
one of the available types for an "object stream" object.  Because when 
you think of it, the deflating of multiple objects into a single zlib 
stream can be applied to all object types not only deltas.  If ever 
deflating many blobs into one zlib stream is dimmed worth it then the 
encoding will already be ready for it.  Also you can leverage existing 
code to write headers, etc.

I'd suggest you use OBJ_GROUP = 0 as a new primary object type.  Then 
the "size" field in the header could then become the number of objects 
that are included in the group.  Most of the time that will fit in the 
low 4 bits of the first header byte, but if there is more than 15 
grouped objects then more bits can be used on the following byte.  
Anyway so far all the code to generate and parse that is already there.  
If ever there is a need for more extensions that could be prefixed with 
a pure zero byte (an object group with a zero object count which is 
distinguishable from a real group).

Then, having the number of grouped objects, you just have to list the 
usual headers for those objects, which are their type and inflated size 
just like regular object headers, including the base sha1 for deltas.  
Again you already have code to produce and parse those.

And finally just append the objects payload in a single deflated stream.

This way the reading of an object from a group can be optimized if the 
object data is located at the beginning of the stream such that you only 
need to inflate the amount of bytes leading to the desired data 
(possibly caching those for further delta replaying), inflate 
the needed data for the desired object and then ignoring the remaining 
of the stream.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]