Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

Jeff King <peff@xxxxxxxx> · Fri, 5 Oct 2018 16:09:48 -0400

On Fri, Oct 05, 2018 at 10:01:31PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > There's unfortunately not a fast way of doing that. One option would be
> > to keep a counter of "ungraphed commit objects", and have callers update
> > it. Anybody admitting a pack via index-pack or unpack-objects can easily
> > get this information. Commands like fast-import can do likewise, and
> > "git commit" obviously increments it by one.
> >
> > I'm not excited about adding a new global on-disk data structure (and
> > the accompanying lock).
> 
> You don't really need a new global datastructure to solve this
> problem. It would be sufficient to have git-gc itself write out a 4-line
> text file after it runs saying how many tags, commits, trees and blobs
> it found on its last run.
>
> You can then fuzzily compare object counts v.s. commit counts for the
> purposes of deciding whether something like the commit-graph needs to be
> updated, while assuming that whatever new data you have has similar
> enough ratios of those as your existing data.

I think this is basically the same thing as Stolee's suggestion to keep
the total object count in the commit-graph file. The only difference is
here is that we know the actual ratio of commit to blobs for this
particular repository. But I don't think we need to know that. As you
said, this is fuzzy anyway, so a single number for "update the graph
when there are N new objects" is likely enough.

If you had a repository with an unusually large tree, you'd end up
rebuilding the graph more often. But I think it would probably be OK, as
we're primarily trying not to waste time doing a graph rebuild when
we've only done a small amount of other work. But if we just shoved a
ton of objects through index-pack then we did a lot of work, whether
those were commit objects or not.

-Peff