Re: [PATCH 1/2] commit-graph write: add progress output

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 04 Sep 2018 15:07:16 -0700

Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> Before this change the "commit-graph write" command didn't report any

Please describe the pre-patch state in present tense without "Before
this change".

> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.
>
> Furthermore, since the gc.writeCommitGraph setting was added in
> d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
> there was no indication at all from a "git gc" run that anything was
> different. This why one of the progress bars being added here uses

"This is why", I guess.

> start_progress() instead of start_delayed_progress(), so that it's
> guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
> repository:
>
>     $ git -c gc.writeCommitGraph=true gc
>     Enumerating objects: 2821, done.
>     [...]
>     Total 2821 (delta 1670), reused 2821 (delta 1670)
>     Computing commit graph generation numbers: 100% (867/867), done.
>
> On larger repositories, such as linux.git the delayed progress bar(s)

"such as linux.git, the delayed ..."

> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting
> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in

I do not know if there is no point, but if we were to do it, I think
slurping the list of packs and computing the number of objects is
not all that bad.

>  static void compute_generation_numbers(struct packed_commit_list* commits)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
> +	struct progress *progress = NULL;
>  
> +	progress = start_progress(
> +		_("Computing commit graph generation numbers"), commits->nr);
>  	for (i = 0; i < commits->nr; i++) {
> +		display_progress(progress, i);
>  		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
>  		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  			continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.