Re: [PATCH v4 2/2] parse-options.c: add style checks for usage-strings

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 1 Mar 2022 20:37:41 +0100 (CET)

Hi Junio,

On Mon, 28 Feb 2022, Junio C Hamano wrote:

> Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> writes:
>
> > Okay, that's great. But one thing I want to ask - How the discussion
> > for `adding check for usage strings` will be held i.e. Whether the
> > idea is discarded for now.
> >
> > If it is not discarded, then how to proceed? Johannes prefers the first
> > version and Ævar prefers the `add check to parse-options.c` version.
>
> My take on it is that the "first version" that uses an ad-hoc shell
> script will not become acceptably robust.

It is unfortunate that the challenge had been characterized as "parse C",
when in reality we are talking about highly idiomatic code. It's not like
we accept arbitrary input in the `OPT_...()` lines. We _really_ want the
option usage string to be a string that is enclosed in `N_()`.

Additionally, this is about Git's own code, not arbitrary C code provided
by users. That makes that shell script more on par with `t/chainlint.sed`
than with `contrib/coccinelle/*`.

Having said that...

> If coccinelle or other static analyzer can help us check more reliably,
> that would be great because we won't incur runtime cost of checking,
> like the embedded check we added in the latest version that we are
> tentatively removing.

I think that Julia gave us enough to work with, so we can (ab-)use
Coccinelle for static usage string checks, and we should probably do that,
too.

> I also think Dscho simply overreacted only because the check broke
> an in-flight topic that is from his group, which is not universally
> built, and the tests in it was written in such a way that the error
> output from the embedded check was not immediately available when
> run in the CI, making it harder to debug.  None of that is a fault
> in the approach of using the embedded check.

No, I would have reacted the same way if I had seen the failures in any
other topic, with an equally trivial fix that blooms into 42 separate test
case failures.

This explosion made me realize _why_ I found the suggestion to patch
`parse_options()` iffy in the first place: it replaces a static check with
a runtime check, which is almost always something that is regretted later.

And since Abhradeep is a new contributor, I found it important to steer
the direction toward sound advice that they can use over and over again
over the course of their career: whenever possible, prefer static checks
over runtime ones.

> If the embedded check were there from the beginning, together with
> tons of the existing checks done by parse_options_check(), the
> developers themselves of the in-flight topic(s) would have caught
> the problem, even before it hit the public CI.  I am very sure Dscho
> wouldn't have complained or even noticed that you added a new check
> to the parse_options_check().

Indeed, if no static check had been proposed first, I would not have
caught on to the unfortunate suggestion to use a runtime check _instead_.

> So from my point of view, plan should be
>
>  (0) I have been assuming that the check we removed tentatively is
>      correct and the breakage in in-flight topic caught usage
>      strings that were malformed.  If not, we need to tweak it to
>      make sure it does not produce false positives.
>
>  (1) Help Microsoft folks fix the in-flight topic with faulty usage
>      strings.

You're so sweet, but I already did that in parallel.

>
>  (2) Rethink if parse_options_check() can be made optional at
>      runtime, which would (a) allow our test to enable it, and allow
>      us to test all code paths that use parse_options() centrally,
>      and (b) allow us to bypass the check while the end-user runs
>      "git", to avoid overhead of checking the same option[] array,
>      which does not change between invocations of "git", over and
>      over again all over the world.
>
>      We may add the check back to parse_options_check() after doing
>      the above.  There are already tons of "check sanity of what is
>      inside option[]" in there, and it would be beneficial if we can
>      separate out from parse_options_start() the sanity checking
>      code, regardless of this topic.
>
>  (3) While (2) is ongoing, we can let people also explore static
>      analysis possibilities.

Of course, if we can convince Coccinelle (together with Python) to give us
the static check, we might very well be able to port more of
`parse_options_check()` from runtime checks to static ones, which would be
a clear win.

If that is possible, we could save ourselves a lot of time by skipping (2)
altogether.

And as I said, Julia's advice looked really good. If only I wasn't
desperately short on time, I would have given it a try because it sounds
not only fun but also very, very useful in Git's context.

Ciao,
Dscho