Re: RFC: [PATCH] Support incremental pack files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



mkoegler@xxxxxxxxxxxxxxxxx (Martin Koegler) writes:

> Commiting a new version in GIT increases the storage by the compressed
> size of each changed blob. Packing all unpacked objects decreases the
> required storage, but does not generate deltas against objects in
> packs. You need to repack all objects to get around this.
>
> For normal source code, this is not a problem.  But if you want to use
> git for big files, you waste storage (or CPU time for everything
> repacking).

Three points that might help you without any code change.

 - Have you run "git repack -a -d" without "-f"?  Reusing of
   existing delta is specifically designed to avoid the "CPU
   time for everything repacking" problem.

 - If you are dealing with something other than "normal source
   code", do you know if your objects delta against each other
   well?  If not, turning core.legacyheaders off might be a
   win.  It allows the objects that are recorded as non-delta in
   resulting pack to be copied straight from loose objects.

 - Once you accumulated large enough packs with existing
   objects, marking them with .keep would leave them untouched
   during subsequent repack.  When "git repack -a -d" repacks
   "everything", its definition of "everything" becomes "except
   things that are in packs marked with .keep files".

Side note: Is the .keep mechanism sufficiently documented?  I am
too lazy to check that right now, but here is a tip.  After
releasing the big one, line v1.5.0, I do:

  $ P=.git/objects/pack
  $ git rev-list --objects v1.5.0 |
    git pack-objects --delta-base-offset \
          --depth=30 --window=100 --no-reuse-delta pack
  ...
  6fba5cb8ed92dfef71ff47def9f95fa1e703ba59
  $ mv pack-6fba5cb8ed92dfef71ff47def9f95fa1e703ba59.* $P/
  $ echo 'Post 1.5.0' >$P/pack-6fba5cb8ed92dfef71ff47def9f95fa1e703ba59.keep
  $ git gc --prune

This does three things:

 - It packs everything reachable from v1.5.0 with delta chain
   that is deeper than the default.

 - The pack is installed in the object store; the presence of
   .keep file (the contents of it does not matter) tells
   subsequent repack not to touch it.

 - Then the remaining objects are packed into different pack.

With this, the repository uses two packs, one is what I'll keep
until it's time to do the big repack again, another is what's
constantly recreated by repacking but contains only "recent"
object.

> It only permits, that the base commit of a delta is located in a
> different pack or as unpacked object.

This "only" change needs to be done _very_ carefully, since
self-containedness of pack files is one of the important
elements of the stability of a git repository.

In effect, you are making the delta and its base object into a
new type of "reachability" for the purpose of fsck/prune by
allowing incremental pack to contain a delta against a loose
object.  I am not saying it is a bad idea, but making sure you
covered every case you could lose necessary objects will be a
lot of work.

For example, suppose a delta in your incremental pack is based
on a loose object.  That loose object can become unreachable
after rewinding or rebasing your refs.  You have to somehow
arrange that git-prune knows this situation and prevent it from
getting pruned -- otherwise your incremental pack becomes
corrupt.

And that is just one example I could come up with after seeing
your message in 3 minutes while watching TV ;-).  I would
usually say "I am sure there will be more...", but in this
particular case, I am inclined to say that I do not even want to
start thinking about possible fallout from this.  It's scary.



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]