I've adjusted the In-reply-to header here. This is really in reply to <20210916183137.GD76263@xxxxxxxxxx>, but that's a reply to a previous & longer RFC-ish iteration of the series. https://lore.kernel.org/git/cover-0.8-0000000000-20210722T125012Z-avarab@xxxxxxxxx/ is what's currently queued as ab/only-single-progress-at-once in "seen". Let's continue the discussion relevant to the currently proposed patches in this thread. On Thu, Sep 16 2021, SZEDER Gábor wrote: > On Wed, Jun 23, 2021 at 07:48:12PM +0200, Ævar Arnfjörð Bjarmason wrote: >> The progress.c code makes a hard assumption that only one progress bar >> be active at a time (see [1] for a bug where this wasn't the case), >> but nothing has asserted that that's the case. Let's add a BUG() >> that'll trigger if two progress bars are active at the same time. > > I very much dislike the idea of any BUG() in the progress code that > can trigger outside of the test suite. First regarding the state of this series. I'd understood your: Please don't advance this to next yet. I've found some issues with it, but not the time to raise them. In https://lore.kernel.org/git/20210901050406.GB76263@xxxxxxxxxx/ to mean (, by perhaps overly reading between the lines out of paranoia,) that you'd found some case where we'd hit the BUG() here. Am I correct that you haven't, but are concerned that we've left some case undiscovered where we might? I'm wondering given your replies there if you thought that the series to be merged down was that 25 patch one. I agree (e.g. because of your https://lore.kernel.org/git/20210916183711.GE76263@xxxxxxxxxx/) that merging that one down wouldn't be a good idea in its current state. Of course you may still have valid concerns etc., just trying to clarify if there's some specific known outstanding issue or not with the current 8-patch series in "seen". > As the number of progress-related fixes clearly show, we often misuse > the progress API, and, arguably, a bug is a bug is a bug, so strictly > speaking a BUG() is not wrong here. > > However, the progress line is merely a UI gimmick, not a crucial part > of Git, and none of those progress bugs affected the correctness of > the operation itself. Worse, calling BUG() during some operations > (e.g. 'git commit-graph write', the worst offender when it comes to > progress bugs) can leave a lockfile behind, resulting in scary errors > and requiring manual cleanup in the .git directory, which is a much > worse UX than showing some bogus values or out of order progress > lines. Yes, I agree that actually hitting this BUG() would absolutely suck, and that we shouldn't consider this patch if we weren't certain, or near enough, that we wouldn't hit it. As covered in the cover letter of the earlier series I sent (at https://lore.kernel.org/git/cover-00.25-00000000000-20210623T155626Z-avarab@xxxxxxxxx/) I share your concerns that it's hard to uncover if we've reached sufficient coverage to be certain that we should add certain assertions, i.e. some of the ones your initial series added around the actual progress counting that we do. In this case though, I do think we can safely add this. Maybe you have run into the BUG() and I'm about to get some egg on my face, but here's why I think we won't hit it. This is a *much* narrower case than the general thread quagmire around display_progress() etc., since this only covers the start/stop progress calls. All of the multi-threaded code we have does the equivalent of: p = start_progress(); start_threads(); /* do stuff in threads, including with "p" */ stop_threads(); stop_progress(p); Do you agree that if this is the pattern everywhere that this patch would be safe in its current form? To assert that that's the case (I'd read the code before) I instrumented the tests to BUG() out if we ever start() or stop() where getpid() != gettid(), which on Linux means you're inside a pthread. The diff-on-top is at the end of this E-Mail. With that and running the tests with: GIT_TEST_BUG_START=1 \ GIT_TEST_BUG_STOP=1 We pass all tests, i.e. there's no current callers that call start_progress() or start_progress() and do so in anything but the main program thread. Well, there could be, per the concerns I had in the CL linked above, i.e. sometimes that start_progress() is guarded by a preceding isatty() or whatever, but having looked at those / grepped the union of pthread + progress in the source I think there's no such cases. That's *much* easier to eyeball than "does all this control flow around display_progress() make sense?". Getting back on topic, we pass all tests with that, but we'll fail with: GIT_TEST_BUG_DISPLAY=1 \ GIT_TEST_BUG_DISPLAY_HARDER=1 Why the second variable? Because I marked up the two callers that call display_progress() within threads, it's just index-pack and pack-objects. If you don't set GIT_TEST_BUG_DISPLAY_HARDER=1 we'll whitelist those two, and all tests pass. diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 6cc48902170..05c82dc6e6d 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1229,6 +1229,7 @@ static void resolve_deltas(void) QSORT(ofs_deltas, nr_ofs_deltas, compare_ofs_delta_entry); QSORT(ref_deltas, nr_ref_deltas, compare_ref_delta_entry); + no_progress_bug(); if (verbose || show_resolving_progress) progress = start_progress(_("Resolving deltas"), nr_ref_deltas + nr_ofs_deltas); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ec8503563a6..97ad321f67c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3072,6 +3072,7 @@ static void prepare_pack(int window, int depth) if (nr_deltas && n > 1) { unsigned nr_done = 0; + no_progress_bug(); if (progress) progress_state = start_progress(_("Compressing objects"), nr_deltas); diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index f3aba5d6cbb..efa1e5e5bdf 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -5,6 +5,9 @@ . ${0%/*}/lib.sh +export GIT_TEST_BUG_START=true +export GIT_TEST_BUG_STOP=true + case "$CI_OS_NAME" in windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw "$cache_dir/.prove")";; *) ln -s "$cache_dir/.prove" t/.prove;; diff --git a/progress.c b/progress.c index 14a023f4b43..bdebf100ad8 100644 --- a/progress.c +++ b/progress.c @@ -16,6 +16,7 @@ #include "trace.h" #include "utf8.h" #include "config.h" +#include <unistd.h> #define TP_IDX_MAX 8 @@ -202,8 +203,18 @@ void display_throughput(struct progress *progress, uint64_t total) display(progress, progress->last_value, NULL); } +static int no_bug_please; +void no_progress_bug(void) +{ + no_bug_please = 1; +} + void display_progress(struct progress *progress, uint64_t n) { + if (getpid() != gettid() && getenv("GIT_TEST_BUG_DISPLAY") && + (getenv("GIT_TEST_BUG_DISPLAY_HARDER") || !no_bug_please)) + BUG("display: pid = %d, tid = %d: %s\n", getpid(), gettid(), + progress ? progress->title : "N/A"); if (progress) display(progress, n, NULL); } @@ -281,6 +292,11 @@ static struct progress *start_progress_delay(const char *title, uint64_t total, progress->split = 0; set_progress_signal(progress); trace2_region_enter("progress", title, the_repository); + + if (getpid() != gettid() && getenv("GIT_TEST_BUG_START")) + BUG("start: pid = %d, tid = %d: %s\n", getpid(), gettid(), + title ? title : "N/A"); + return progress; } @@ -334,6 +350,11 @@ static void finish_if_sparse(struct progress *progress) void stop_progress(struct progress **p_progress) { + no_bug_please = 0; + if (getpid() != gettid() && getenv("GIT_TEST_BUG_STOP")) + BUG("stop: pid = %d, tid = %d: %s\n", getpid(), gettid(), + ((p_progress && *p_progress) ? (*p_progress)->title : "N/A")); + if (!p_progress) BUG("don't provide NULL to stop_progress"); diff --git a/progress.h b/progress.h index f1913acf73f..2ebb1da2666 100644 --- a/progress.h +++ b/progress.h @@ -20,5 +20,6 @@ struct progress *start_delayed_sparse_progress(const char *title, uint64_t total); void stop_progress(struct progress **progress); void stop_progress_msg(struct progress **progress, const char *msg); +void no_progress_bug(void); #endif