Re: [PATCH v3 10/10] progress.c: add & assert a "global_progress" variable

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 25 Oct 2021 13:06:03 +0200

On Mon, Oct 25 2021, SZEDER Gábor wrote:

> On Thu, Oct 14, 2021 at 12:28:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> The progress.c code makes a hard assumption that only one progress bar
>> be active at a time (see [1] for a bug where this wasn't the
>> case). Add a BUG() that'll trigger if we ever regress on that promise
>> and have two progress bars active at the same time.
>
> I still very much dislike the idea of a BUG() in the progress code
> that can trigger outside of the test suite, because the progress line
> is only a UI gimmick and not a crucial part of any Git operation, and
> even though a progress line might be buggy, the underlying Git
> operation is not affected by it and would still finish successfully,
> as was the case with the dozen of so progress line bugs in the past.
>
>> There was an alternative test-only approach to doing the same
>> thing[2], but by doing this outside of a GIT_TEST_* mode we'll know
>> we've put a hard stop to this particular API misuse.
>> 
>> It will also establish scaffolding to address current fundamental
>> limitations in the progress output: The current output must be
>> "driven" by calls to the likes of display_progress().
>
> Please elaborate why that is a "fundamental limitation"; I don't see
> any drawback of the current approach.
>
>> Once we have a
>> global current progress object we'll be able to update that object via
>> SIGALRM.
>
> What are the supposed benefits of doing that?  I do see its drawbacks,
> considering that we have progress lines that are updated from multiple
> threads.

I've updated the commit messages in a re-roll I have incoming to
hopefully clear this up.

>> See [3] for early code to do that.
>> 
>> It's conceivable that this change will hit the BUG() condition in some
>> scenario that we don't currently have tests for, this would be very
>> bad. If that happened we'd die just because we couldn't emit some
>> pretty output.
>> 
>> See [4] for a discussion of why our test coverage is lacking; our
>> progress display is hidden behind isatty(2) checks in many cases, so
>> the test suite doesn't cover it unless individual tests are run in
>> "--verbose" mode, we might also have multi-threaded use of the API, so
>> two progress bars stopping and starting would only be visible due to a
>> race condition.
>> 
>> Despite that, I think that this change won't introduce such
>> regressions, because:
>> 
>>  1. I've read all the code using the progress API (and have modified a
>>     large part of it in some WIP code I have). Almost all of it is really
>>     simple, the parts that aren't[5] are complex in the display_progress() part,
>>     not in starting or stopping the progress bar.
>> 
>>  2. The entire test suite passes when instrumented with an ad-hoc
>>     Linux-specific mode (it uses gettid()) to die if progress bars are
>>     ever started or stopped on anything but the main thread[6].
>> 
>>     Extending that to die if display_progress() is called in a thread
>>     reveals that we have exactly two users of the progress bar under
>>     threaded conditions, "git index-pack" and "git pack-objects". Both
>>     uses are straightforward, and they don't start/stop the progress
>>     bar when threads are active.
>> 
>>  3. I've likewise done an ad-hoc test to force progress bars to be
>>     displayed with:
>> 
>>         perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)')
>> 
>>     I.e. to replace all checks (not just for progress) of checking
>>     whether STDERR is connected to a TTY, and then monkeypatching
>>     is_foreground_fd() in progress.c to always "return 1". Running the
>>     tests with those applied, interactively and under -V reveals via:
>> 
>>         $ grep -e set_progress_signal -e clear_progress_signal test-results/*out
>> 
>>     That nothing our tests cover hits the BUG conditions added here,
>>     except the expected "BUG: start two concurrent progress bars" test
>>     being added here.
>> 
>>     That isn't entirely true since we won't be getting 100% coverage
>>     due to cascading failures from tests that expected no progress
>>     output on stderr. To make sure I covered 100% I also tried making
>>     the display() function in progress.c a NOOP on top of that (it's
>>     the calls to start_progress_delay() and stop_progress()) that
>>     matter.
>> 
>>     That doesn't hit the BUG() either. Some tests fail in that mode
>>     due to a combination of the overzealous isatty(2) munging noted
>>     above, and the tests that are testing that the progress output
>>     itself is present (but for testing I'd made display() a NOOP).
>> 
>> Between those three points I think it's safe to go ahead with this
>> change.
>
> The above analysis only considers _our_ _current_ codebase.
>
> However, even though this might be safe now, it doesn't mean that it
> will remain safe in the future, as we might add new progress lines
> that lack test coverage (though hopefully won't), and would hit that
> BUG() at a user.
>
> Furthermore, even though this might be safe in our codebase, it
> doesn't mean that it is safe in some 20+k forks of Git that exist on
> GitHub alone (I for one have a git command or two with in my fork
> which output progress lines but, sadly, have zero test coverage).
>
> But more importantly, even though it might be safe to do so, that
> doesn't mean that it's a good idea to do so.  The commit message does
> little to justify why it is conceptually a good idea to add this BUG()
> to the progress code in a way that it can trigger outside of the test
> suite.

I think partially I've addressed this above (i.e. in the incoming
re-roll's update commit message), but I might not for the question of
whether this is worth it overall. I'll update the commit message to
address this specific case, which was missing from it.

>> 1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09)
>> 2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@xxxxxxxxx
>> 3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@xxxxxxxxx/
>> 4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@xxxxxxxxx/
>> 5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into
>>    next, 2021-09-10)
>> 6. https://lore.kernel.org/git/877dffg37n.fsf@xxxxxxxxxxxxxxxxxxx/
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> ---
>>  progress.c                  | 18 ++++++++++++++----
>>  t/t0500-progress-display.sh | 11 +++++++++++
>>  2 files changed, 25 insertions(+), 4 deletions(-)
>> 
>> diff --git a/progress.c b/progress.c
>> index b9369e9a264..a31500f8b2b 100644
>> --- a/progress.c
>> +++ b/progress.c
>> @@ -46,6 +46,7 @@ struct progress {
>>  };
>>  
>>  static volatile sig_atomic_t progress_update;
>> +static struct progress *global_progress;
>>  
>>  /*
>>   * These are only intended for testing the progress output, i.e. exclusively
>> @@ -219,11 +220,16 @@ void progress_test_force_update(void)
>>  	progress_interval(SIGALRM);
>>  }
>>  
>> -static void set_progress_signal(void)
>> +static void set_progress_signal(struct progress *progress)
>>  {
>>  	struct sigaction sa;
>>  	struct itimerval v;
>>  
>> +	if (global_progress)
>> +		BUG("'%s' progress still active when trying to start '%s'",
>> +		    global_progress->title, progress->title);
>> +	global_progress = progress;
>
> This function is called set_progress_signal(), so checking and setting
> 'global_progress' feels out of place here; it would be better to do
> that in start_progress_delay().
>
>> +
>>  	if (progress_testing)
>>  		return;
>>  
>> @@ -241,10 +247,14 @@ static void set_progress_signal(void)
>>  	setitimer(ITIMER_REAL, &v, NULL);
>>  }
>>  
>> -static void clear_progress_signal(void)
>> +static void clear_progress_signal(struct progress *progress)
>>  {
>>  	struct itimerval v = {{0,},};
>>  
>> +	if (!global_progress)
>> +		BUG("should have active global_progress when cleaning up");
>> +	global_progress = NULL;
>
> Likewise.

*Nod* cleaned up this whole part, which became much simpler overall as a
result, thank you.