Am 14.06.21 um 13:07 schrieb Ævar Arnfjörð Bjarmason: > > On Thu, Jun 10 2021, René Scharfe wrote: > >> Am 09.06.21 um 00:12 schrieb Ævar Arnfjörð Bjarmason: >>> >>> On Tue, Jun 08 2021, René Scharfe wrote: >>> >>>> I wonder (only in a semi-curious way, though) if we can detect >>>> off-by-one errors by adding an assertion to display_progress() that >>>> requires the first update to have the value 0, and in stop_progress() >>>> one that requires the previous display_progress() call to have a value >>>> equal to the total number of work items. Not sure it'd be worth the >>>> hassle.. >>> >>> That's intentional. We started eating 3 apples, got to one, but now our >>> house is on fire and we're eating no more apples today, even if we >>> planned to eat 3 when we sat down. >>> >>> The progress bar reflects this unexpected but recoverable state: >>> >>> $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' | >>> ./helper/test-tool progress --total=3 Apples 2>&1 | >>> cat -v | perl -pe 's/\^M\K/\n/g' >>> Apples: 0% (0/3)^M >>> Apples: 33% (1/3)^M >>> Apples: 33% (1/3), done. >>> >>> We're at 1/3, but we're done. No more apples. >>> >>> This isn't just some hypothetical, e.g. consider neeing to unlink() or >>> remove files/directories one at a time in a directory and getting the >>> estimated number from st_nlink (yeah yeah, unportable, but it was the >>> first thing I thought of). >>> >>> We might think we're processing 10 entries, but another other processes >>> might make our progress bar end at more or less than the 100% we >>> expected. That's OK, not something we should invoke BUG() about. >> >> It doesn't have to be a BUG; a warning would suffice. And I hope not >> finishing the expected number of items due to a catastrophic event is >> rare enough that an additional warning wouldn't cause too much pain. > > It's not a catastrophic event, just a run of the mill race condition > we'll expect if we're dealing with the real world. > > E.g. you asked to unlink 1000 files, we do so, we find 10 are unlinked > already, or the command is asked to recursively unlink all files in a > directory tree, and new ones have showed up. > > In those cases we should just just shrug and move on, no need for a > warning. We just don't always have perfect information about future > state at the start of the loop. If a planned work item is cancelled then it can still be counted as done. Or the total could be adjusted, but that might look awkward. >> Loops that *regularly* end early are not a good fit for progress >> percentages, I think. > > Arguably yes, but in these fuzzy cases not providing a "total" means > showing no progress at all, just a counter. Perhaps we should have some > other "provide total, and it may be fuzzy" flag. Not providing it might > run into your proposed BUG(), my point was that the current API > providing this flexibility is intentional. Your patch turns a loop that doesn't immediately report skipped items into one with contiguous progress updates. That's a good way to deal with the imagined restrictions for error detection. Another would be to make the warnings optional. >>> Similarly, the n=0 being distinguishable from the first >>> display_progress() is actually useful in practice. It's something I've >>> seen git.git emit (not recently, I patched the relevant code to emit >>> more granular progress). >>> >>> It's useful to know that we're stalling on the setup code before the >>> for-loop, not on the first item. >> >> Hmm, preparations that take a noticeable time might deserve their own >> progress line. > > Sure, and I've split some of those up in the past, but this seems like > ducking/not addressing the point that the API use we disagree on has > your preferred use conflating these conditions, but mine does not... Subtle. If preparation takes a long time and each item less than that then the progress display is likely to jump from "0/n" to "i/n", where i > 1, and the meaning of "1/n" becomes moot. The progress display could show just the title before the first display_progress() call to make the distinction clear. Would it really be useful, though? Perhaps a trace2 region started by the first display_progress() call and ended by the last one (n == total) would be better. >> Anyway, if no guard rails can be built then we have to rely on our math >> skills alone. Off-by-one errors may look silly, but are no joke -- they >> are surprisingly easy to make. > > ...which, regardless of whether one views a progress of "1/5 items" has > "finished 1/5" or "working on 1/5", which I think *in general* is an > arbitrary choice, I think the progress.c API we have in git.git clearly > fits the usage I'm describing better. How so? start_progress() specifies a title and the total number of items, display_progress() reports some other number that is shown in relation to the total, and stop_progress() finishes the progress line. This API is not documented and thus its meaning is (strictly speaking) left unspecified. It can be used to show a classic "percent-done progress indicator", as https://dl.acm.org/doi/10.1145/1165385.317459 calls it. That's how I read a growing percentage shown by a program, and "done" I understand as "has been done" (completed), not as "is being done". Wikipedia sent me to https://chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf, which has some fun ideas on how to warp the perception of time for users staring at a progress bar, but also states typical ones grow with the amount of completed work. René