Re: [PATCH] git exproll: steps to tackle gc aggression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Tue, Aug 6, 2013 at 9:38 AM, Ramkumar Ramachandra <artagnon@xxxxxxxxx> wrote:
>> +               Garbage collect using a pseudo logarithmic packfile maintenance
>> +               approach.  This approach attempts to minimize packfile churn
>> +               by keeping several generations of varying sized packfiles around
>> +               and only consolidating packfiles (or loose objects) which are
>> +               either new packfiles, or packfiles close to the same size as
>> +               another packfile.
>
> I wonder if a simpler approach may be nearly efficient as this one:
> keep the largest pack out, repack the rest at fetch/push time so there
> are at most 2 packs at a time. Or we we could do the repack at 'gc
> --auto' time, but with lower pack threshold (about 10 or so). When the
> second pack is as big as, say half the size of the first, merge them
> into one at "gc --auto" time. This can be easily implemented in
> git-repack.sh.

Another random thought.

Imagine we have a cheap way to enumerate the young objects without
the usual history traversal.  For example, list of all loose objects
and what appears in the .idx files that are young.

We can reconstruct "names" for trees and blobs from such a list of
object names; if a commit in the list refers to a tree, that tree is
the top level, and a blob or a tree that appears in such a top-level
tree can be given a "name" for its place in the tree (recursively).
I suspect we would end up giving names to large majority of trees
and blobs in such a list by doing so.

If these suspicions turn out to be true, then we could:

 - run that enumeration algorithm to come up with a set of object
   names;

 - emit the tag objects in that set in the tagger timestamp order;

 - emit the commit objects in that set in the commit timestamp
   order, while noting the tree objects contained in the set, giving
   them name "";

 - "traverse" the trees and blobs in that set, giving the found ones
   names (do so without stepping outside the set);

 - emit the trees and blobs with their names.  Some objects may not
   have given any name, but that is OK as long as they are in the
   minority.

And feeding it to pack-objects to produce a single pack, and then
prune away the source of these young objects in the end.

The above could turn out to be much cheaper than the traditional
history traversal.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]