Re: Mozilla .git tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> If you're going to redo the pack formats another big win for the
> Mozilla pack is to convert pack internal sha1 references into file
> offsets.within the pack. Doing that will take around 30MB off from the
> Mozilla pack size. sha1's are not compressible so this is a direct
> savings.

Right now Junio's working on the index to break the 4 GiB barrier.
I think Junio and Nico have already agreed to change the base SHA1
to be an offset instead; though this is an issue for the current
way the base gets written out behind the delta as you need to know
exactly how many bytes the delta is going to be so you can correctly
compute the offset.
 
> This might reduce memory usage too. The index is only needed to get
> the initial object from the pack. Since index use is lighter it could
> just be open/closed when needed.

True; however when you are walking a series of commits (to produce
output for `git log` for example) every time you parse a commit you
need to go back to the .idx to relookup the ancestor commit(s).
So you don't want to open/close the .idx file on every object;
instead put the .idx file into the LRU like the .pack files are
(or into their own LRU chain) and maintain some threshold on how
many bytes worth of .idx is kept live.
 
> You could also introduce a zlib dictionary object into the format and
> just leave it empty for now.

No.  I'm not sure I'm ready to propose that as a solution for
decreasing pack size.  Now that my exams are over I've started
working on a true dictionary based compression implementation.
I want to try to get Git itself repacked under it, then try the
Mozilla pack after I get my new amd64 based system built.

If that's as big of space saver as we're hoping it would be then
the pack format would be radically different and need to change;
if it doesn't gain us anything (or is worse!) then we can go back
to the drawing board and consider other pack format changes such as
a zlib dictionary.  But right now its measly 4% gain isn't very much.

-- 
Shawn.

-- 
VGER BF report: U 0.653439
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]