Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling

René Scharfe <l.s.r@xxxxxx> · Tue, 15 Jun 2021 17:14:07 +0200

Am 14.06.21 um 21:08 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Jun 14 2021, René Scharfe wrote:
>
>> Am 14.06.21 um 13:07 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Thu, Jun 10 2021, René Scharfe wrote:
>>>
>>>> Am 09.06.21 um 00:12 schrieb Ævar Arnfjörð Bjarmason:
>>>>>
>>>>> On Tue, Jun 08 2021, René Scharfe wrote:
>>>>>
>>>>>> I wonder (only in a semi-curious way, though) if we can detect
>>>>>> off-by-one errors by adding an assertion to display_progress() that
>>>>>> requires the first update to have the value 0, and in stop_progress()
>>>>>> one that requires the previous display_progress() call to have a value
>>>>>> equal to the total number of work items.  Not sure it'd be worth the
>>>>>> hassle..
>>>>>
>>>>> That's intentional. We started eating 3 apples, got to one, but now our
>>>>> house is on fire and we're eating no more apples today, even if we
>>>>> planned to eat 3 when we sat down.
>>>>>
>>>>> The progress bar reflects this unexpected but recoverable state:
>>>>>
>>>>>     $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' |
>>>>>       ./helper/test-tool progress --total=3 Apples 2>&1 |
>>>>>       cat -v | perl -pe 's/\^M\K/\n/g'
>>>>>     Apples:   0% (0/3)^M
>>>>>     Apples:  33% (1/3)^M
>>>>>     Apples:  33% (1/3), done.
>>>>>
>>>>> We're at 1/3, but we're done. No more apples.
>>>>>
>>>>> This isn't just some hypothetical, e.g. consider neeing to unlink() or
>>>>> remove files/directories one at a time in a directory and getting the
>>>>> estimated number from st_nlink (yeah yeah, unportable, but it was the
>>>>> first thing I thought of).
>>>>>
>>>>> We might think we're processing 10 entries, but another other processes
>>>>> might make our progress bar end at more or less than the 100% we
>>>>> expected. That's OK, not something we should invoke BUG() about.
>>>>
>>>> It doesn't have to be a BUG; a warning would suffice.  And I hope not
>>>> finishing the expected number of items due to a catastrophic event is
>>>> rare enough that an additional warning wouldn't cause too much pain.
>>>
>>> It's not a catastrophic event, just a run of the mill race condition
>>> we'll expect if we're dealing with the real world.
>>>
>>> E.g. you asked to unlink 1000 files, we do so, we find 10 are unlinked
>>> already, or the command is asked to recursively unlink all files in a
>>> directory tree, and new ones have showed up.
>>>
>>> In those cases we should just just shrug and move on, no need for a
>>> warning. We just don't always have perfect information about future
>>> state at the start of the loop.
>>
>> If a planned work item is cancelled then it can still be counted as
>> done.  Or the total could be adjusted, but that might look awkward.
>>
>>>> Loops that *regularly* end early are not a good fit for progress
>>>> percentages, I think.
>>>
>>> Arguably yes, but in these fuzzy cases not providing a "total" means
>>> showing no progress at all, just a counter. Perhaps we should have some
>>> other "provide total, and it may be fuzzy" flag. Not providing it might
>>> run into your proposed BUG(), my point was that the current API
>>> providing this flexibility is intentional.
>>
>> Your patch turns a loop that doesn't immediately report skipped items
>> into one with contiguous progress updates.  That's a good way to deal
>> with the imagined restrictions for error detection.  Another would be
>> to make the warnings optional.
>
> I don't see how there's anything wrong with the API use, how it needs a
> warning etc.

You pointed out that many callsites do:

	for (i = 0; i < large_number; i++) {
		display_progress(p, i + 1);
		/* work work work */
	}

This is an off-by-one error because a finished item is reported before
work on it starts.  Adding a warning can help find these cases reliably.

>>>>> Similarly, the n=0 being distinguishable from the first
>>>>> display_progress() is actually useful in practice. It's something I've
>>>>> seen git.git emit (not recently, I patched the relevant code to emit
>>>>> more granular progress).
>>>>>
>>>>> It's useful to know that we're stalling on the setup code before the
>>>>> for-loop, not on the first item.
>>>>
>>>> Hmm, preparations that take a noticeable time might deserve their own
>>>> progress line.
>>>
>>> Sure, and I've split some of those up in the past, but this seems like
>>> ducking/not addressing the point that the API use we disagree on has
>>> your preferred use conflating these conditions, but mine does not...
>>
>> Subtle.  If preparation takes a long time and each item less than that
>> then the progress display is likely to jump from "0/n" to "i/n", where
>> i > 1, and the meaning of "1/n" becomes moot.
>
> In practice we're making humongous jumps all over the place, we don't
> write to the terminal for every item processed, and if we did it would
> be too fast to be perceptable to the user.
>
> So I don't think this is an issue in the first place, as noted upthread
> in <8735tt4fhx.fsf@xxxxxxxxxxxxxxxxxxx>. Regardless of what we think of
> the supposed off-by-one issue you seemed to think that it was enough of
> an issue to justify complexity at the API use level (needing to think
> about "continue" statements in loops, etc.), but now you think it's
> moot?

I don't understand your question.  Let me rephrase what I find moot:

You wrote that the first display_progress() call being made with n>0
would be useful to you to see long-running preparations.  If items are
processed quicker than one per second, then whatever number the first
display_progress() call writes to the screen will be overwritten within
a second.  So the value of the first update shouldn't actually matter
much for your use case -- unless items takes a long time to process.

René