Re: RFC: [PATCH] Support incremental pack files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 23 Feb 2007, Martin Koegler wrote:

> With CVS (or RCS), commiting a version, increases the required storage
> only by the size of the diff.
> 
> Commiting a new version in GIT increases the storage by the compressed
> size of each changed blob. Packing all unpacked objects decreases the
> required storage, but does not generate deltas against objects in
> packs. You need to repack all objects to get around this.
> 
> For normal source code, this is not a problem.  But if you want to use
> git for big files, you waste storage (or CPU time for everything
> repacking).

Did you try repack -a -d (without -f) ?

When -f is not used, already deltified objects are simply copied as is 
into the new pack without further processing.

> The follwing patch (again git-1.5.0-rc3) is a prototyp for supporting
> incremental pack files in git. The file structures are not changed.
> It only permits, that the base commit of a delta is located in a
> different pack or as unpacked object.

We always refused to have packs in the repository that are not self 
contained because that would pave the way for all sort of nasty issues.  
It is otherwise much harder to prevent circular delta chains, harder to 
ensure full reachability when pruning disconnected objects at the 
hierarchical level, etc.  And those are real issues that would bite you 
as soon as you perform a single fetch or push with something else than 
the native protocol.

In other words I think this is a bad idea for repository storage.  We do 
it a part of the native GIT protocol because it is obvious that there is 
no possibility for delta loops (ommitted base objects in the transmitted 
pack are known to exists in the peer repository) and those packs are 
fixed up with missing objects on the receive side when not exploded into 
loose objects.

Again a repack without -f should not be that expensive.  If it is then 
something is wrong and that should be fixed.

One thing that is too expensive in GIT is rev-list --objects --all (or 
equivalent) used to list objects to pack.  But Shawn and I have a plan 
to fix that at some point... (if only I can find some spare time to 
write more code for it).


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]