On Wed, Jun 01 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@xxxxxxxxxx> > > The previous change added a warning when valid_remote() detects > credentials in the URL. Since remotes are validated multiple times per > process, this causes multiple warnings to print. Why are we validating remotes multiple times per process? Can't we just do it once? Is this a case of two callers going through the whole machinery and not being aware of one another? Perhaps it's a pain to deal with that in this case, but it would be better to note why here than taking it as a given. > To avoid these kinds of repeated, advisory warnings, create a new > warn_once() helper that behaves the same as warning(), but only after > formatting the output string and adding it to a strset. If that addition > signals that the string already exists in the strset, then do not print > the warning. This feels quite iffy given the low-level API & the rest of it aiming to be thread-safe, see 2d3c02f5db6 (die(): stop hiding errors due to overzealous recursion guard, 2017-06-21) for such a case. I.e. there *is* racy code there already that (ab)uses a no-lock pattern to optimistically abort early when we'd emit N of the same recursion message. But are we as confident that concurrent strset callers in multiple threads will behave themselves? I'd think we should be adding this to the caller that knows it's not threaded, e.g. whatever calls check_if_creds_in_url(). > In the case of the credentials in a URL, the existing test demonstrates > this per-process limitation: 'git clone' runs 'git-remote-curl' as a > child process, giving two messages. This is an improvement over the > previous six messages. If we know about this limitation and we're going to be checking the same URLs why not do slightly better and pass down a "don't warn please" to the sub-process? Or do we think it might warn about *different* URLs, aren't these always the same (i.e. we read the full config)? If you used advice instead of a warning you'd get a config key to pass for free to disable it, and that might be a better thing to do in any case (i.e. a better fit for this case). But we could also just check getenv("GIT_URL_REDACTED_WARNING") or whatever, which seems easy enough... > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index cba3553b7c4..6ae3eec9eb6 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -75,7 +75,10 @@ test_expect_success 'clone warns or fails when using username:password' ' > test_must_fail git -c fetch.credentialsInUrl=allow clone https://username:password@localhost attempt1 2>err && > ! grep "URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err && > test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err && > - grep "warning: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err && > + grep "warning: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err >warnings && > + # The warning is printed twice, for the two processes: > + # "git clone" and "git-remote-curl". > + test_line_count = 2 warnings && > test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err && > grep "fatal: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err > ' Hrm, between 1/2 and this I think this is a good example of a "just use test_cmp" caveat. I.e. reading 1/2 it's snuck past us that there's this existing caveat in the warning being added, i.e. that we should really warn 1 times, but are doing in N times, but as we use ad-hoc "grep" instead of test_cmp-ing the full output that's not obvious. I think this would be much more obvious as: warning="<your warning msg>" && cat >expect <<-EOF && $warning $warning EOF Or whatever, followed by a: test_cmp expect actual Then 1/2 could make a brief note that we're leaving this duplication issue for now, and 2/2 would fix it. But even better (and per the above, I'm not convinced about the direction, but leaving that aside): Here we have a choice between having 1/2 and 2/2 in that order and having 2/2 add a new API that has its first user right away, but at the cost of fixing a bug we just introduced in 1/2. I think even if we can't find another API user for this proposed usage.c addition, just adding it without a user & adding what's now 1/2 as the 2nd commit would be preferrable. Then we're not doing a "oops, bug fix" while we're at it. > +static struct strset prev_warnings = STRSET_INIT; > + > +void warn_once(const char *warn, ...) > +{ If we do end up keeping this (per the above I'm thinking while it's just this caller it should probably own this problem): I have a local change to clean up this warn v.s. warning inconsistency in usage.c, but now it's all in internal code. But let's not add to it by adding a warn_once(), it should be called warning_once(). We're also missing an "errno" variant, which might be fine for now, ditto variants for the other non-fatal functions (error & die_message). That might be OK for now, but probably worth noting. > +void warn_once(const char *warn, ...) > +{ > + char buf[1024]; > + va_list params; > + va_start(params, warn); > + > + if (vsnprintf(buf, sizeof(buf), warn, params) >= 0) { It doesn't matter for the end result, but let's compare with "< 0" like the other vsnprintf() caller in the file. And how is this unlike that vsnprintf() in not needing the same error handling vreportf() does here: *p = '\0'; /* vsnprintf() failed, clip at prefix */ Seems like either a bug, or that other code being something we should be dropping. > + if (!strset_add(&prev_warnings, buf)) { More on the general API usability front: E.g. builtin/pack-objects.c seems like it could make use of thing like this in two places. But the API you've added here would only be usable for 1/2, i.e. you format and de-dup on the full warning, whereas at least 1/2 of those callers wants to de-dup on the un-formatted string (grep for 'static int warned'). Which (and I'm sounding like a broken record:) is another thing that me wonder if the general API is premature, i.e. it's a red flag that we have existing code that could benefit from it, if not for an arbitrary limitation, in this case fixing the limitation would mean the churn of either adding a new function, or a new parameter to all callers. I.e. a "int dedup_on_args" or similar. > + va_end(params); > + return; > + } > + } > + va_end(params); > + > + va_start(params, warn); I'm rusty on the varargs API (and haven't tested) but didn't we need va_copy() if we're iterating twice, or was that just if we're passing it to another function and using it ourselves...