Re: [PATCH] git exproll: steps to tackle gc aggression

Duy Nguyen <pclouds@xxxxxxxxx> · Sat, 10 Aug 2013 08:24:39 +0700



On Sat, Aug 10, 2013 at 5:16 AM, Jeff King <peff@xxxxxxxx> wrote:
> Another solution could involve not writing the duplicate of Y in the
> first place. The reason we do not store thin-packs on disk is that you
> run into problems with cycles in the delta graph (e.g., A deltas against
> B, which deltas against C, which deltas against A; at one point you had
> a full copy of one object which let you create the cycle, but you later
> deleted it as redundant with the delta, and now you cannot reconstruct
> any of the objects).
>
> You could possibly solve this with cycle detection, though it would be
> complicated (you need to do it not just when getting rid of objects, but
> when sending a pack, to make sure you don't send a cycle of deltas that
> the other end cannot use). You _might_ be able to get by with a kind of
> "two-level" hack: consider your main pack as "group A" and newly pushed
> packs as "group B". Allow storing thin deltas on disk from group B
> against group A, but never the reverse (nor within group B). That makes
> sure you don't have cycles, and it eliminates even more I/O than any
> repacking solution (because you never write the extra copy of Y to disk
> in the first place). But I can think of two problems:
>
>   1. You still want to repack more often than every 300 packs, because
>      having many packs cost both in space, but also in object lookup
>      time (we can do a log(N) search through each pack index, but have
>      to search linearly through the set of indices).
>
>   2. As you accumulate group B packs with new objects, the deltas that
>      people send will tend to be against objects in group B. They are
>      closer to the tip of history, and therefore make better deltas for
>      history built on top.
>
> That's all just off the top of my head. There are probably other flaws,
> too, as I haven't considered it too hard.

Some refinements on this idea

 - We could keep packs in group B ordered as the packs come in. The
new pack can depend on the previous ones.
 - A group index in addition to separate index for each pack would
solve linear search object lookup problem.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html