Re: We should add a "git gc --auto" after "git clone" due to commit graph

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 03 Oct 2018 16:18:33 +0200

On Wed, Oct 03 2018, Derrick Stolee wrote:

> On 10/3/2018 9:36 AM, SZEDER Gábor wrote:
>> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> Don't have time to patch this now, but thought I'd send a note / RFC
>>> about this.
>>>
>>> Now that we have the commit graph it's nice to be able to set
>>> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>>> /etc/gitconfig to apply them to all repos.
>>>
>>> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>>> until whenever my first "gc" kicks in, which may be quite some time if
>>> I'm just using it passively.
>>>
>>> So we should make "git gc --auto" be run on clone,
>> There is no garbage after 'git clone'...
>
> And since there is no garbage, the gc will not write the commit-graph.

I should probably have replied to this instead of SZEDER's in
https://public-inbox.org/git/87r2h7gmd7.fsf@xxxxxxxxxxxxxxxxxxx/ anyway
my 0.02 on that there.

>>
>>> and change the
>>> need_to_gc() / cmd_gc() behavior so that we detect that the
>>> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
>>> then just generate that without doing a full repack.
>> Or just teach 'git clone' to run 'git commit-graph write ...'
>
> I plan to add a 'fetch.writeCommitGraph' config setting. I was waiting
> until the file is incremental (on my to-do list soon), so the write is
> fast when only adding a few commits at a time. This would cover the
> clone case, too.

It's re-arranging deck chairs on the Titanic at this point, but this
approach seems like the wrong way to go in this whole "do we have crap
to do?" git-gc state-machine.

In my mind we should have only one entry point into that, and there
shouldn't be magic like "here's the gc-ish stuff we do on
fetch". Because if we care about a bunch of new commits being added on
"fetch", that can also happen on "commit", "am", "merge", all of which
run "gc --auto" now.

Which is why I'm suggesting that we could add a sub-mode in need_to_gc()
that detects if a file we want to generate is entirely missing, which is
extendable to future formats, and the only caveat at that point is if
we'd like that subset of "gc" to block and run in the foreground in the
"clone" (or "fetch", ...) case.

And then if we have a desire to incrementally add recently added commits
to such formats, "gc --auto" could learn to consume reflogs or some
other general inventory of "stuff added since last gc", and then we
wouldn't have to instrument "fetch" specifically, the same would work
for "commit", "am", "merge" etc.