On Mon, Oct 25 2021, SZEDER Gábor wrote: > On Thu, Oct 14, 2021 at 12:28:26AM +0200, Ævar Arnfjörð Bjarmason wrote: >> The progress.c code makes a hard assumption that only one progress bar >> be active at a time (see [1] for a bug where this wasn't the >> case). Add a BUG() that'll trigger if we ever regress on that promise >> and have two progress bars active at the same time. > > I still very much dislike the idea of a BUG() in the progress code > that can trigger outside of the test suite, because the progress line > is only a UI gimmick and not a crucial part of any Git operation, and > even though a progress line might be buggy, the underlying Git > operation is not affected by it and would still finish successfully, > as was the case with the dozen of so progress line bugs in the past. > >> There was an alternative test-only approach to doing the same >> thing[2], but by doing this outside of a GIT_TEST_* mode we'll know >> we've put a hard stop to this particular API misuse. >> >> It will also establish scaffolding to address current fundamental >> limitations in the progress output: The current output must be >> "driven" by calls to the likes of display_progress(). > > Please elaborate why that is a "fundamental limitation"; I don't see > any drawback of the current approach. > >> Once we have a >> global current progress object we'll be able to update that object via >> SIGALRM. > > What are the supposed benefits of doing that? I do see its drawbacks, > considering that we have progress lines that are updated from multiple > threads. I've updated the commit messages in a re-roll I have incoming to hopefully clear this up. >> See [3] for early code to do that. >> >> It's conceivable that this change will hit the BUG() condition in some >> scenario that we don't currently have tests for, this would be very >> bad. If that happened we'd die just because we couldn't emit some >> pretty output. >> >> See [4] for a discussion of why our test coverage is lacking; our >> progress display is hidden behind isatty(2) checks in many cases, so >> the test suite doesn't cover it unless individual tests are run in >> "--verbose" mode, we might also have multi-threaded use of the API, so >> two progress bars stopping and starting would only be visible due to a >> race condition. >> >> Despite that, I think that this change won't introduce such >> regressions, because: >> >> 1. I've read all the code using the progress API (and have modified a >> large part of it in some WIP code I have). Almost all of it is really >> simple, the parts that aren't[5] are complex in the display_progress() part, >> not in starting or stopping the progress bar. >> >> 2. The entire test suite passes when instrumented with an ad-hoc >> Linux-specific mode (it uses gettid()) to die if progress bars are >> ever started or stopped on anything but the main thread[6]. >> >> Extending that to die if display_progress() is called in a thread >> reveals that we have exactly two users of the progress bar under >> threaded conditions, "git index-pack" and "git pack-objects". Both >> uses are straightforward, and they don't start/stop the progress >> bar when threads are active. >> >> 3. I've likewise done an ad-hoc test to force progress bars to be >> displayed with: >> >> perl -pi -e 's[isatty\(2\)][1]g' $(git grep -l -F 'isatty(2)') >> >> I.e. to replace all checks (not just for progress) of checking >> whether STDERR is connected to a TTY, and then monkeypatching >> is_foreground_fd() in progress.c to always "return 1". Running the >> tests with those applied, interactively and under -V reveals via: >> >> $ grep -e set_progress_signal -e clear_progress_signal test-results/*out >> >> That nothing our tests cover hits the BUG conditions added here, >> except the expected "BUG: start two concurrent progress bars" test >> being added here. >> >> That isn't entirely true since we won't be getting 100% coverage >> due to cascading failures from tests that expected no progress >> output on stderr. To make sure I covered 100% I also tried making >> the display() function in progress.c a NOOP on top of that (it's >> the calls to start_progress_delay() and stop_progress()) that >> matter. >> >> That doesn't hit the BUG() either. Some tests fail in that mode >> due to a combination of the overzealous isatty(2) munging noted >> above, and the tests that are testing that the progress output >> itself is present (but for testing I'd made display() a NOOP). >> >> Between those three points I think it's safe to go ahead with this >> change. > > The above analysis only considers _our_ _current_ codebase. > > However, even though this might be safe now, it doesn't mean that it > will remain safe in the future, as we might add new progress lines > that lack test coverage (though hopefully won't), and would hit that > BUG() at a user. > > Furthermore, even though this might be safe in our codebase, it > doesn't mean that it is safe in some 20+k forks of Git that exist on > GitHub alone (I for one have a git command or two with in my fork > which output progress lines but, sadly, have zero test coverage). > > But more importantly, even though it might be safe to do so, that > doesn't mean that it's a good idea to do so. The commit message does > little to justify why it is conceptually a good idea to add this BUG() > to the progress code in a way that it can trigger outside of the test > suite. I think partially I've addressed this above (i.e. in the incoming re-roll's update commit message), but I might not for the question of whether this is worth it overall. I'll update the commit message to address this specific case, which was missing from it. >> 1. 6f9d5f2fda1 (commit-graph: fix progress of reachable commits, 2020-07-09) >> 2. https://lore.kernel.org/git/20210620200303.2328957-3-szeder.dev@xxxxxxxxx >> 3. https://lore.kernel.org/git/patch-18.25-e21fc66623f-20210623T155626Z-avarab@xxxxxxxxx/ >> 4. https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@xxxxxxxxx/ >> 5. b50c37aa44d (Merge branch 'ab/progress-users-adjust-counters' into >> next, 2021-09-10) >> 6. https://lore.kernel.org/git/877dffg37n.fsf@xxxxxxxxxxxxxxxxxxx/ >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> >> --- >> progress.c | 18 ++++++++++++++---- >> t/t0500-progress-display.sh | 11 +++++++++++ >> 2 files changed, 25 insertions(+), 4 deletions(-) >> >> diff --git a/progress.c b/progress.c >> index b9369e9a264..a31500f8b2b 100644 >> --- a/progress.c >> +++ b/progress.c >> @@ -46,6 +46,7 @@ struct progress { >> }; >> >> static volatile sig_atomic_t progress_update; >> +static struct progress *global_progress; >> >> /* >> * These are only intended for testing the progress output, i.e. exclusively >> @@ -219,11 +220,16 @@ void progress_test_force_update(void) >> progress_interval(SIGALRM); >> } >> >> -static void set_progress_signal(void) >> +static void set_progress_signal(struct progress *progress) >> { >> struct sigaction sa; >> struct itimerval v; >> >> + if (global_progress) >> + BUG("'%s' progress still active when trying to start '%s'", >> + global_progress->title, progress->title); >> + global_progress = progress; > > This function is called set_progress_signal(), so checking and setting > 'global_progress' feels out of place here; it would be better to do > that in start_progress_delay(). > >> + >> if (progress_testing) >> return; >> >> @@ -241,10 +247,14 @@ static void set_progress_signal(void) >> setitimer(ITIMER_REAL, &v, NULL); >> } >> >> -static void clear_progress_signal(void) >> +static void clear_progress_signal(struct progress *progress) >> { >> struct itimerval v = {{0,},}; >> >> + if (!global_progress) >> + BUG("should have active global_progress when cleaning up"); >> + global_progress = NULL; > > Likewise. *Nod* cleaned up this whole part, which became much simpler overall as a result, thank you.