Duy Nguyen <pclouds@xxxxxxxxx> writes: > On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra <artagnon@xxxxxxxxx> wrote: >> + Garbage collect using a pseudo logarithmic packfile maintenance >> + approach. This approach attempts to minimize packfile churn >> + by keeping several generations of varying sized packfiles around >> + and only consolidating packfiles (or loose objects) which are >> + either new packfiles, or packfiles close to the same size as >> + another packfile. > > I wonder if a simpler approach may be nearly efficient as this one: > keep the largest pack out, repack the rest at fetch/push time so there > are at most 2 packs at a time. Or we we could do the repack at 'gc > --auto' time, but with lower pack threshold (about 10 or so). When the > second pack is as big as, say half the size of the first, merge them > into one at "gc --auto" time. This can be easily implemented in > git-repack.sh. Another random thought. Imagine we have a cheap way to enumerate the young objects without the usual history traversal. For example, list of all loose objects and what appears in the .idx files that are young. We can reconstruct "names" for trees and blobs from such a list of object names; if a commit in the list refers to a tree, that tree is the top level, and a blob or a tree that appears in such a top-level tree can be given a "name" for its place in the tree (recursively). I suspect we would end up giving names to large majority of trees and blobs in such a list by doing so. If these suspicions turn out to be true, then we could: - run that enumeration algorithm to come up with a set of object names; - emit the tag objects in that set in the tagger timestamp order; - emit the commit objects in that set in the commit timestamp order, while noting the tree objects contained in the set, giving them name ""; - "traverse" the trees and blobs in that set, giving the found ones names (do so without stepping outside the set); - emit the trees and blobs with their names. Some objects may not have given any name, but that is OK as long as they are in the minority. And feeding it to pack-objects to produce a single pack, and then prune away the source of these young objects in the end. The above could turn out to be much cheaper than the traditional history traversal. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html