Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling

René Scharfe <l.s.r@xxxxxx> · Mon, 14 Jun 2021 19:18:05 +0200

Am 14.06.21 um 13:07 schrieb Ævar Arnfjörð Bjarmason:
>
> On Thu, Jun 10 2021, René Scharfe wrote:
>
>> Am 09.06.21 um 00:12 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Tue, Jun 08 2021, René Scharfe wrote:
>>>
>>>> I wonder (only in a semi-curious way, though) if we can detect
>>>> off-by-one errors by adding an assertion to display_progress() that
>>>> requires the first update to have the value 0, and in stop_progress()
>>>> one that requires the previous display_progress() call to have a value
>>>> equal to the total number of work items.  Not sure it'd be worth the
>>>> hassle..
>>>
>>> That's intentional. We started eating 3 apples, got to one, but now our
>>> house is on fire and we're eating no more apples today, even if we
>>> planned to eat 3 when we sat down.
>>>
>>> The progress bar reflects this unexpected but recoverable state:
>>>
>>>     $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' |
>>>       ./helper/test-tool progress --total=3 Apples 2>&1 |
>>>       cat -v | perl -pe 's/\^M\K/\n/g'
>>>     Apples:   0% (0/3)^M
>>>     Apples:  33% (1/3)^M
>>>     Apples:  33% (1/3), done.
>>>
>>> We're at 1/3, but we're done. No more apples.
>>>
>>> This isn't just some hypothetical, e.g. consider neeing to unlink() or
>>> remove files/directories one at a time in a directory and getting the
>>> estimated number from st_nlink (yeah yeah, unportable, but it was the
>>> first thing I thought of).
>>>
>>> We might think we're processing 10 entries, but another other processes
>>> might make our progress bar end at more or less than the 100% we
>>> expected. That's OK, not something we should invoke BUG() about.
>>
>> It doesn't have to be a BUG; a warning would suffice.  And I hope not
>> finishing the expected number of items due to a catastrophic event is
>> rare enough that an additional warning wouldn't cause too much pain.
>
> It's not a catastrophic event, just a run of the mill race condition
> we'll expect if we're dealing with the real world.
>
> E.g. you asked to unlink 1000 files, we do so, we find 10 are unlinked
> already, or the command is asked to recursively unlink all files in a
> directory tree, and new ones have showed up.
>
> In those cases we should just just shrug and move on, no need for a
> warning. We just don't always have perfect information about future
> state at the start of the loop.

If a planned work item is cancelled then it can still be counted as
done.  Or the total could be adjusted, but that might look awkward.

>> Loops that *regularly* end early are not a good fit for progress
>> percentages, I think.
>
> Arguably yes, but in these fuzzy cases not providing a "total" means
> showing no progress at all, just a counter. Perhaps we should have some
> other "provide total, and it may be fuzzy" flag. Not providing it might
> run into your proposed BUG(), my point was that the current API
> providing this flexibility is intentional.

Your patch turns a loop that doesn't immediately report skipped items
into one with contiguous progress updates.  That's a good way to deal
with the imagined restrictions for error detection.  Another would be
to make the warnings optional.

>>> Similarly, the n=0 being distinguishable from the first
>>> display_progress() is actually useful in practice. It's something I've
>>> seen git.git emit (not recently, I patched the relevant code to emit
>>> more granular progress).
>>>
>>> It's useful to know that we're stalling on the setup code before the
>>> for-loop, not on the first item.
>>
>> Hmm, preparations that take a noticeable time might deserve their own
>> progress line.
>
> Sure, and I've split some of those up in the past, but this seems like
> ducking/not addressing the point that the API use we disagree on has
> your preferred use conflating these conditions, but mine does not...

Subtle.  If preparation takes a long time and each item less than that
then the progress display is likely to jump from "0/n" to "i/n", where
i > 1, and the meaning of "1/n" becomes moot.

The progress display could show just the title before the first
display_progress() call to make the distinction clear.  Would it really
be useful, though?  Perhaps a trace2 region started by the first
display_progress() call and ended by the last one (n == total) would
be better.

>> Anyway, if no guard rails can be built then we have to rely on our math
>> skills alone.  Off-by-one errors may look silly, but are no joke -- they
>> are surprisingly easy to make.
>
> ...which, regardless of whether one views a progress of "1/5 items" has
> "finished 1/5" or "working on 1/5", which I think *in general* is an
> arbitrary choice, I think the progress.c API we have in git.git clearly
> fits the usage I'm describing better.

How so?  start_progress() specifies a title and the total number of
items, display_progress() reports some other number that is shown in
relation to the total, and stop_progress() finishes the progress line.
This API is not documented and thus its meaning is (strictly speaking)
left unspecified.

It can be used to show a classic "percent-done progress indicator", as
https://dl.acm.org/doi/10.1145/1165385.317459 calls it.  That's how I
read a growing percentage shown by a program, and "done" I understand
as "has been done" (completed), not as "is being done".

Wikipedia sent me to
https://chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf,
which has some fun ideas on how to warp the perception of time for
users staring at a progress bar, but also states typical ones grow
with the amount of completed work.

René