Re: [PATCH v3 2/2] usage: add warn_once() helper for repeated warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/1/2022 8:29 AM, Ævar Arnfjörð Bjarmason wrote:> 
> On Wed, Jun 01 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@xxxxxxxxxx>
>>
>> The previous change added a warning when valid_remote() detects
>> credentials in the URL. Since remotes are validated multiple times per
>> process, this causes multiple warnings to print.
> 
> Why are we validating remotes multiple times per process? Can't we just
> do it once?
> 
> Is this a case of two callers going through the whole machinery and not
> being aware of one another?
> 
> Perhaps it's a pain to deal with that in this case, but it would be
> better to note why here than taking it as a given.

We could certainly investigate this more, but it seems like a more
problematic approach than the one taken here. We could add a "is_valid"
bit to struct remote, but then could some code path modify that struct
after it was validated?

>> To avoid these kinds of repeated, advisory warnings, create a new
>> warn_once() helper that behaves the same as warning(), but only after
>> formatting the output string and adding it to a strset. If that addition
>> signals that the string already exists in the strset, then do not print
>> the warning.
> 
> This feels quite iffy given the low-level API & the rest of it aiming to
> be thread-safe, see 2d3c02f5db6 (die(): stop hiding errors due to
> overzealous recursion guard, 2017-06-21) for such a case.

It makes sense to keep the low-level library as thread-safe as possible.

Would it be enough to document this particular caller as not thread-safe?

>> In the case of the credentials in a URL, the existing test demonstrates
>> this per-process limitation: 'git clone' runs 'git-remote-curl' as a
>> child process, giving two messages. This is an improvement over the
>> previous six messages.
> 
> If we know about this limitation and we're going to be checking the same
> URLs why not do slightly better and pass down a "don't warn please" to
> the sub-process?>> Or do we think it might warn about *different* URLs, aren't these always
> the same (i.e. we read the full config)?

While this example of 'git clone' to 'git-remote-curl' will not use a
different URL, I can imagine other child process connections that could
expose a different URL. Having a blanket statement "don't emit these
warnings" seems unlikely to be helpful for those cases.

> If you used advice instead of a warning you'd get a config key to pass
> for free to disable it, and that might be a better thing to do in any
> case (i.e. a better fit for this case). But we could also just check
> getenv("GIT_URL_REDACTED_WARNING") or whatever, which seems easy
> enough...

To make the environment variable work properly, we would need to pass a
list of URLs in the variable that could seed the strset, to avoid dropping
warnings for new cases as mentioned earlier.

>> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
>> index cba3553b7c4..6ae3eec9eb6 100755
>> --- a/t/t5601-clone.sh
>> +++ b/t/t5601-clone.sh
>> @@ -75,7 +75,10 @@ test_expect_success 'clone warns or fails when using username:password' '
>>  	test_must_fail git -c fetch.credentialsInUrl=allow clone https://username:password@localhost attempt1 2>err &&
>>  	! grep "URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err &&
>>  	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err &&
>> -	grep "warning: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err &&
>> +	grep "warning: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err >warnings &&
>> +	# The warning is printed twice, for the two processes:
>> +	# "git clone" and "git-remote-curl".
>> +	test_line_count = 2 warnings &&
>>  	test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err &&
>>  	grep "fatal: URL '\''https://username:<redacted>@localhost'\'' uses plaintext credentials" err
>>  '
> 
> Hrm, between 1/2 and this I think this is a good example of a "just use
> test_cmp" caveat.
> 
> I.e. reading 1/2 it's snuck past us that there's this existing caveat in
> the warning being added, i.e. that we should really warn 1 times, but
> are doing in N times, but as we use ad-hoc "grep" instead of
> test_cmp-ing the full output that's not obvious.
> 
> I think this would be much more obvious as:
> 
> 	warning="<your warning msg>" &&
> 	cat >expect <<-EOF &&
> 	$warning
>         $warning
> 	EOF

There are other warnings coming over stderr, including the "Cloning
into..." message and the resulting "fatal:" message because that URL
doesn't actually exist. I chose to decouple this test to the exact
phrasing of those messages.

> But even better (and per the above, I'm not convinced about the
> direction, but leaving that aside): Here we have a choice between having
> 1/2 and 2/2 in that order and having 2/2 add a new API that has its
> first user right away, but at the cost of fixing a bug we just
> introduced in 1/2.
> 
> I think even if we can't find another API user for this proposed usage.c
> addition, just adding it without a user & adding what's now 1/2 as the
> 2nd commit would be preferrable. Then we're not doing a "oops, bug fix"
> while we're at it.

I wouldn't call this a "oops, bug fix" but instead "let's improve what was
working, but noisy." This also gives us the opportunity to see the code in
usage.c get tested immediately as it is introduced, which is an important
property to keep in mind when organizing a patch series.

>> +static struct strset prev_warnings = STRSET_INIT;
>> +
>> +void warn_once(const char *warn, ...)
>> +{
> 
> If we do end up keeping this (per the above I'm thinking while it's just
> this caller it should probably own this problem):
> 
> I have a local change to clean up this warn v.s. warning inconsistency
> in usage.c, but now it's all in internal code.
> 
> But let's not add to it by adding a warn_once(), it should be called
> warning_once().

Sure. The phrase "warn once" is a valid sentence, while "warning once" is
not, but I can see some benefit to having consistent naming here.

If the "warning()" method was not already present so much throughout the
codebase, then I would advocate that "warn()" is a better verb form. This
would be similar to "die()" and even "error()" could be interpreted as a
verb. However, that would be too large of a change to be worthwhile.

> We're also missing an "errno" variant, which might be fine for now,
> ditto variants for the other non-fatal functions (error &
> die_message). That might be OK for now, but probably worth noting.

Noted.

>> +void warn_once(const char *warn, ...)
>> +{
>> +	char buf[1024];
>> +	va_list params;
>> +	va_start(params, warn);
>> +
>> +	if (vsnprintf(buf, sizeof(buf), warn, params) >= 0) {
> 
> It doesn't matter for the end result, but let's compare with "< 0" like
> the other vsnprintf() caller in the file.

Except that we do something after this block in both cases, so the
condition needs to be swapped or else duplicate code.

> And how is this unlike that vsnprintf() in not needing the same error
> handling vreportf() does here:
> 
> 	*p = '\0'; /* vsnprintf() failed, clip at prefix */
> 
> Seems like either a bug, or that other code being something we should be
> dropping.

No, because the params are passed to warn_routine() in the section of code
you cropped out:

+	va_start(params, warn);
+	warn_routine(warn, params);
+	va_end(params);

That warn_routine does the "clip at prefix" fix. if we have a failure in
vsnprintf() here, then we are skipping the "only once" check and going
straight to that logic.

>> +		if (!strset_add(&prev_warnings, buf)) {
> 
> More on the general API usability front: E.g. builtin/pack-objects.c
> seems like it could make use of thing like this in two places.
> 
> But the API you've added here would only be usable for 1/2, i.e. you
> format and de-dup on the full warning, whereas at least 1/2 of those
> callers wants to de-dup on the un-formatted string (grep for 'static int
> warned').
> 
> Which (and I'm sounding like a broken record:) is another thing that me
> wonder if the general API is premature, i.e. it's a red flag that we
> have existing code that could benefit from it, if not for an arbitrary
> limitation, in this case fixing the limitation would mean the churn of
> either adding a new function, or a new parameter to all callers. I.e. a
> "int dedup_on_args" or similar.

You are advocating that since there is one example of something that
doesn't fit this exact scenario, then we shouldn't move forward on
something that _does_ have immediate use (and potential use in other
places). This seem unnecessarily pedantic.

This specific method is a good foundation for possibly extending it if
there is value in doing so.

>> +			va_end(params);
>> +			return;
>> +		}
>> +	}
>> +	va_end(params);
>> +
>> +	va_start(params, warn);
> 
> I'm rusty on the varargs API (and haven't tested) but didn't we need
> va_copy() if we're iterating twice, or was that just if we're passing it
> to another function and using it ourselves...

Well, outside of the fact that this function is tested and it works,
the documentation says this about the 'params' argument (named 'ap'
in the function prototype):

  If ap has already been passed as first argument to a previous call to
  va_start or va_copy, it shall be passed to va_end before calling this
  function.

So, va_end(params) makes it safe to call va_start(params, warn) again.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux