mkoegler@xxxxxxxxxxxxxxxxx (Martin Koegler) writes: > Commiting a new version in GIT increases the storage by the compressed > size of each changed blob. Packing all unpacked objects decreases the > required storage, but does not generate deltas against objects in > packs. You need to repack all objects to get around this. > > For normal source code, this is not a problem. But if you want to use > git for big files, you waste storage (or CPU time for everything > repacking). Three points that might help you without any code change. - Have you run "git repack -a -d" without "-f"? Reusing of existing delta is specifically designed to avoid the "CPU time for everything repacking" problem. - If you are dealing with something other than "normal source code", do you know if your objects delta against each other well? If not, turning core.legacyheaders off might be a win. It allows the objects that are recorded as non-delta in resulting pack to be copied straight from loose objects. - Once you accumulated large enough packs with existing objects, marking them with .keep would leave them untouched during subsequent repack. When "git repack -a -d" repacks "everything", its definition of "everything" becomes "except things that are in packs marked with .keep files". Side note: Is the .keep mechanism sufficiently documented? I am too lazy to check that right now, but here is a tip. After releasing the big one, line v1.5.0, I do: $ P=.git/objects/pack $ git rev-list --objects v1.5.0 | git pack-objects --delta-base-offset \ --depth=30 --window=100 --no-reuse-delta pack ... 6fba5cb8ed92dfef71ff47def9f95fa1e703ba59 $ mv pack-6fba5cb8ed92dfef71ff47def9f95fa1e703ba59.* $P/ $ echo 'Post 1.5.0' >$P/pack-6fba5cb8ed92dfef71ff47def9f95fa1e703ba59.keep $ git gc --prune This does three things: - It packs everything reachable from v1.5.0 with delta chain that is deeper than the default. - The pack is installed in the object store; the presence of .keep file (the contents of it does not matter) tells subsequent repack not to touch it. - Then the remaining objects are packed into different pack. With this, the repository uses two packs, one is what I'll keep until it's time to do the big repack again, another is what's constantly recreated by repacking but contains only "recent" object. > It only permits, that the base commit of a delta is located in a > different pack or as unpacked object. This "only" change needs to be done _very_ carefully, since self-containedness of pack files is one of the important elements of the stability of a git repository. In effect, you are making the delta and its base object into a new type of "reachability" for the purpose of fsck/prune by allowing incremental pack to contain a delta against a loose object. I am not saying it is a bad idea, but making sure you covered every case you could lose necessary objects will be a lot of work. For example, suppose a delta in your incremental pack is based on a loose object. That loose object can become unreachable after rewinding or rebasing your refs. You have to somehow arrange that git-prune knows this situation and prevent it from getting pruned -- otherwise your incremental pack becomes corrupt. And that is just one example I could come up with after seeing your message in 3 minutes while watching TV ;-). I would usually say "I am sure there will be more...", but in this particular case, I am inclined to say that I do not even want to start thinking about possible fallout from this. It's scary. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html