Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling

René Scharfe <l.s.r@xxxxxx> · Sun, 20 Jun 2021 14:53:05 +0200

Am 15.06.21 um 18:46 schrieb Ævar Arnfjörð Bjarmason:
> We disagree, and I for one think I understand what you mean, perhaps you
> don't understand what I mean, but let's try to move on.

Perhaps.  You seem to think of progress as being represented by a real
number.  We show integers, though, and you want to round up.

The progress display's function is to inform the user that work is being
done and how much there is still to do.  It allows them to decide
whether to keep the program running.

A progress of "100%" being shown for an extended duration would lead me
to the conclusion that the program hangs and I'd cancel it.  Rounding
down (truncating) prevents that.

Showing an estimated time of completion as well would be even nicer,
but is only practical if the time taken for each work item is roughly
the same.

But let's move on indeed.  The part of your patch that moves the
display_progress() call to the top of the loop to avoid stalling is a
good idea and worth splitting out into its own patch (keeping the "i").

In general it seems that changes described with "Let's also ..." or
"While at it ..." almost always deserve their own patch.  I need to
follow that insight more myself..

> I think it would be better if you replied specifically to the comments I
> had later about throughput progress bars, i.e.:
>
>     How does the idea that we show "has been done" make sense when you
>     combine the progress.c API with the display_throughput(). I.e. output
>     like[...]

Junio already replied to that, but since you ask, here are my thoughts:

Progress and throughput are separate metrics.  Adding one doesn't change
the other.  The throughput value is not specific to the currently
processed item.

Say we download a number of files of different sizes and want to show
our progress.  Then from time to time we display the number of processed
files and how many bytes we got since the last update, divided by the
time passed since then.  The reported bytes could belong to multiple
files.  Or we could process lots of zero-sized files, which would keep
throughput low.

> Anyway, in this case I understood you to mean that you thought the
> off-by-one wasn't a big deal in practice most of the time, I don't think
> so either for e.g. counting objects in pack files.

Not exactly.  While I think a difference of one isn't a big deal most
of the time, also think there is a correct way, i.e. to show the number
of completed items.  You have found ways to use an off-by-one error, and
my point was that this usage is not reliable.  Perhaps that's a weak
and convoluted argument.

> I do think it's useful to be consistent though, and for e.g. cases of
> downloading 5 files it makes sense to show 1/5 if we are currently in
> the process of downloading files 1 out of 5, not 0/5 or whatever.

I agree that we should be consistent.  If we have downloaded 70% of the
first of five files then we have 0.7 files, which is not yet 1 file, so
we have to say 0/5.

But let's move on, for real.

René