Re: [PATCH 0/7] progress: verify progress counters in the test suite

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 21 Jun 2021 02:59:53 +0200

On Sun, Jun 20 2021, SZEDER Gábor wrote:

> Splitting off from:
>
>   https://public-inbox.org/git/cover-0.2-0000000000-20210607T144206Z-avarab@xxxxxxxxx/T/#me5d3176914d4268fd9f2a96fc63f4e41beb26bd6
>
> On Tue, Jun 08, 2021 at 06:14:42PM +0200, René Scharfe wrote:
>> I wonder (only in a semi-curious way, though) if we can detect
>> off-by-one errors by adding an assertion to display_progress() that
>> requires the first update to have the value 0, and in stop_progress()
>> one that requires the previous display_progress() call to have a value
>> equal to the total number of work items.  Not sure it'd be worth the
>> hassle..
>
> I fixed and reported a number of bogus progress lines in the past, the
> last one during v2.31.0-rc phase, so I've looked into whether progress
> counters could be automatically validated in our tests, and came up
> with these patches a few months ago.  It turned out that progress
> counters can be checked easily and transparently in case of progress
> lines that are shown in the tests, i.e. that are shown even when
> stderr is not a terminal or are forced with '--progress'.  (In other
> cases it's still fairly easy but not quite transparent, as I think we
> need changes to the progress API; more on that later in a separate
> series.)

I've also been working on some progress.[ch] patches that are mostly
finished, and I'm some 20 patches in at the moment. I wasn't sure about
whether to send an alternate 20-patch "let's do this (mostly) instead?"
series, hence this message.

Much of what you're doing here becomes easier after that series,
e.g. your global process struct in 2/7 is something I ended up
implementing as part of a general feature to allow progress to be driven
by either display_progress() *or* the signal handler itself.

Thus we can show a "stalled" message if we run start(), but hang before
we ever call display_progress(), as we do on e.g. git.git in gc's
"Enumerating Objects" phase (at least on my laptop).

So e.g. your 2/7 becomes a general hard assertion, not some test-only
mode.

After that I use the same facility to implement a mode where any signal
can update a new "spinner" part of the progress bar. So let's say you're
hanging on item 1/3 and not calling display_progress() at all, we'll
update a spinner on each signal to show the user that git itself isn't
hanging, just working.

I could also rebase on yours, but much of it would be rewriting the
test-only code to be more generalized, perhaps it's easier if we start
going for the more generalized solution first.

Per some of what I mentioned in the thread you linked to I'm a bit
uncomfortable with the direction in your 1/7. I seems it works in-tree
for now, but I'd like to take the progress.c API in the direction of a
more generally useful API, not just something that narrowly fits the
exact set of current use-cases.

There's a lot of potential uses in-tree where the total not matching at
the end is just something that happens due to real-world fuzzyness,
e.g. the unlink() example here:
https://public-inbox.org/git/87lf7k2bem.fsf@xxxxxxxxxxxxxxxxxxx/

Perhaps we can just have it BUG() for now as you're doing and cross that
bridge when we come to it. I just wonder if we can't catch potential
bugs in a more gentle way somehow.

> These checks did uncover a couple of buggy progress lines which are
> fixed in this series as well, but I'm not sure that the fix presented
> in patch 6 is the right approach, hence the RFC.

The approach in 6/7 will also have the effect of not balancing a trace2
start/stop region. Quoting a line from its commit message:

    > Arguably, it is wrong to show "done" at the end of the progress
    > line when not all work was done.

I think for a more general API it makes sense to think of "done" as a
different state than "we have reached == total". The target may change
as in the unlink() example, or we may simply decide to abort and "be
done early".