Re: [PATCH] parse-options: make parse_options_check() test-only

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 02 Mar 2022 11:52:22 +0100

On Tue, Mar 01 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Tue, Mar 01 2022, Junio C Hamano wrote:
>>
>>> The array of options given to the parse-options API is sanity
>>> checked for reuse of a single-letter option for multiple entries and
>>> other programmer mistakes by calling parse_options_check() from
>>> parse_options_start().  This allows our developers to catch silly
>>> mistakes early, but all callers of parse-options API pays this cost.
>>> Once the set of options in an array is validated and passes this
>>> check, until a programmer modifies the array, there is no way for it
>>> to fail the check, which is wasteful.
>>
>> That's not true due to the "git rev-parse --parseopt" interface. I'd be
>
> Meaning that a parse-options array can be fed by "rev-parse --parseopt"
> and having the sanity check enabled does help the use case?  Even there,
> I would say that once the script writer finishes developing the script
> that uses "rev-parse --parseopt", setting the parseopt input in stone,
> there is no need to check the same thing over and over again.  Am I
> mistaken?  Does "rev-parse --parseopt" that is fed the same input
> sometimes trigger the sanity check and sometimes not?

If we're declaring that "git rev-parse --parseopt" is something that was
only ever intended for in-tree usage sure, that should hold true.

I.e. "git rev-parse" is documented as plumbing, and we document
--parseopt as a generic option parsing mechanism you can use in
shellscripts.

So out-of-tree users wouldn't guard against
GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could
e.g. segfault on some subsequent code if some of the sanity checks
aren't happening anymore.

No, I'd be quite happy if we declared that it's for our use only, and
could remove it when the last in-tree *.sh user went away. there's a bit
of complexity in parse_options() required only for its use....

>> I see the benifit of Johannes's suggestion of checking this once (but
>> with t0012-help.sh etc. we're nowhere near being able to do that).
>>
>> Now this runs for the whole test suite, so our tests will have the the
>> same behavior.
>
> The code for sanity check is there ONLY to help those who develop
> while they develop, and it is logical to enable it during our tests.
> There is no reason to trigger the sanity check in the end-user
> environment, no?

I don't see the benefit of skipping it. Your commit message mentions
"but all callers of parse-options API pays this cost". As a quick & dumb
perf test I tried:

	diff --git a/parse-options.c b/parse-options.c
	index 6e57744fd22..cabea35e8b1 100644
	--- a/parse-options.c
	+++ b/parse-options.c
	@@ -523,7 +523,10 @@ static void parse_options_start_1(struct parse_opt_ctx_t *ctx,
	        if ((flags & PARSE_OPT_ONE_SHOT) &&
	            (flags & PARSE_OPT_KEEP_ARGV0))
	                BUG("Can't keep argv0 if you don't have it");
	-       parse_options_check(options);
	+       while (1) {
	+               printf(".");
	+               parse_options_check(options);
	+       }
	 }

	 void parse_options_start(struct parse_opt_ctx_t *ctx,

And:

    ./git [am|rebase] | pv >/dev/null

Get around 4MiB/s. I.e. we can do this check ~4 million times/sec on my
computer, with -O3, with -O0 -g it's ~3MiB/s.

So the performance cost is trivial & not worth worrying about.

>> So aren't we shaving microseconds off the runtime here?
>
> No, the problem I have with the runtime check is more at the
> conceptual level.  Those who remove assert() by setting _NDEBUG
> would not be doing so to save nanoseconds, either.

I think the trade-off of not having to worry about the runtime
v.s. "development build" checks is one we've done well with BUG(),
i.e. not to have it be an assert().

E.g. in this case we have parse_options_concat(), so you can dynamically
construct the options to be checked.

I happen to have looked in detail at all of that code in the past, and I
don't *think* it's doing something "actually dynamic". I.e. it should be
the same when the tests run and when git runs in the wild.

But having to know and check that when using or changing the API is just
more state to keep in your head.