Re: [PATCH v3 1/2] commit-graph write: add progress output

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 11 Oct 2018 19:52:21 +0200

On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>
>> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> On Wed, Oct 10 2018, SZEDER Gábor wrote:
>>
>>> >>  	for (i = 0; i < oids->nr; i++) {
>>> >> +		display_progress(progress, ++j);
>>>
>>> [...]
>>>
>>> > This display_progress() call, however, doesn't seem to be necessary.
>>> > First, it counts all commits for a second time, resulting in the ~2x
>>> > difference compared to the actual number of commits, and then causing
>>> > my confusion.  Second, all what this loop is doing is setting a flag
>>> > in commits that were already looked up and parsed in the above loops.
>>> > IOW this loop is very fast, and the progress indicator jumps from
>>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a
>>> > progress indicator at all.
>>>
>>> You're right, I tried this patch on top:
>>
>> [...]
>>
>>> And on a large repo with around 3 million commits the 3rd progress bar
>>> doesn't kick in.
>>>
>>> But if I apply this on top:
>>>
>> [...]
>>>
>>> I.e. start the timer after 1/4 of a second instead of 1 second, I get
>>> that progress bar.
>>>
>>> So I'm inclined to keep it. It just needs to be 4x the size before it's
>>> noticeably hanging for 1 second.
>>
>> Just to clarify: are you worried about a 1 second hang in an approx. 12
>> million commit repository?  If so, then I'm unconvinced, that's not
>> even a blip on the radar, and the misleading numbers are far worse.
>
> It's not a blip on the runtime, but the point of these progress bars in
> general is so we don't have a UI where there's no UI differnce between
> git hanging and just doing work in some tight loop in the background,
> and even 1 second when you're watching something is noticeable if it
> stalls.
>
> Also it's 1 second on a server where I had 128G of RAM. I think even a
> "trivial" flag change like this would very much change if e.g. the
> system was under memory pressure or was swapping.
>
> And as noted code like this tends to change over time, that loop might
> get more expensive, so let's future proof by having all the loops over N
> call the progress code.
>
> When I wrote this the intent was just "report progress". So that it's
> counting anything is just an implementation detail of how progress.c
> works now.
>
> This was the reference to Duy's patch, i.e. instead of spewing numbers
> at the user here let's just render a spinner. Then we no longer need to
> make judgement calls about which loop over N is expensive right now, and
> which one isn't, and if any of them will result in reporting a 2N number
> while the user might be more familiar with or expecting N.
>
>>> That repo isn't all that big compared to what we've heard about out
>>> there, and inner loops like this have a tendency to accumulate some more
>>> code over time without a re-visit of why we weren't monitoring progress
>>> there.
>>>
>>> But maybe we can fix the message. We say "Annotating commits in commit
>>> grap", not "Counting" or whatever. I was trying to find something that
>>> didn't imply that we were doing this once. One can annotate a thing more
>>> than once, but maybe ther's a better way to explain this...
>>
>> IMO just remove it.

Hrm, actually reading this again your initial post says we end up with a
2x difference v.s. the number of commits, but it's actually 3x. The loop
that has a rather trivial runtime comparatively is the 3x, but the 2x
loop takes a notable amount of time. So e.g. on git.git:

    $ git rev-list --all | wc -l; ~/g/git/git commit-graph write
    166678
    Annotating commits in commit graph: 518463, done.
    Computing commit graph generation numbers: 100% (172685/172685), done.