Re: [PATCH v4 7/9] t3437: test script for fixup [-C|-c] options in interactive rebase

Eric Sunshine <sunshine@xxxxxxxxxxxxx> · Wed, 3 Feb 2021 00:44:40 -0500

On Tue, Feb 2, 2021 at 5:02 AM Christian Couder
<christian.couder@xxxxxxxxx> wrote:
> On Tue, Feb 2, 2021 at 3:01 AM Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> > On Fri, Jan 29, 2021 at 1:25 PM Charvi Mendiratta <charvi077@xxxxxxxxx> wrote:
> > > +               merge_*|fixup_*)
> > > +                       action=$(echo "$line" | sed 's/_/ /g');;
> >
> > What is "merge_" doing here? It doesn't seem to be used by this patch.
>
> Yeah, it's not used, but it might be a good thing to add this for
> consistency while at it.

It confuses readers (as it did to me), causing them to waste
brain-cycles trying to figure out why it's present. Thus, it would be
better to add it when it's actually needed. The waste of brain-cycles
and time is especially important on a project like Git for which
reviewers and reviewer time are limited resources.

> > > +# Copyright (c) 2018 Phillip Wood
> >
> > Did Phillip write this script? Is this patch based upon an old patch from him?
>
> Yeah, it might be a good idea to add a "Based-on-patch-by: Phillip ..."

Agreed.

> > The implementation of test_commit_message() is a bit hard to follow.
> > It might be simpler to write it more concisely and directly like this:
> >
> >     git show --no-patch --pretty=format:%B "$1" >actual &&
> >     case "$2" in
> >     -m) echo "$3" >expect && test_cmp expect actual ;;
>
> I think we try to avoid many commands on the same line.

For something this minor, it's not likely to matter but, of course, it
could be split over two lines:

    -m) echo "$3" >expect &&
        test_cmp expect actual ;;

> >     *) test_cmp "$2" actual ;;
> >     esac
>
> In general I am not sure that using $1, $2, $3 directly makes things
> easier to understand, but yeah, with the function documentation that
> you suggest, it might be better to write the function using them
> directly.

The direct $1, $2, etc. was just an example. It's certainly possible
to give them names even in the rewritten code I presented. One good
reason, however, for just using $1, $2, etc. is that $2 is not well
defined; sometimes it's a switch ("-m") and sometimes its a pathname,
so it's hard to invent a suitable variable name for it. Also, this
function becomes so simple (in the rewritten version) that explicit
variable names don't add a lot of value (the cognitive load is quite
low because the function is so short).

> > Style nit: In Git test scripts, the here-doc body and EOF are indented
> > the same amount as the command which opened the here-doc:
>
> I don't think we are very consistent with this and I didn't find
> anything about this in CodingGuidelines.
>
> In t0008 and t0021 for example, the indentation is more like:
>
>      cat >message <<-EOF &&
>           amend! B
>           ...
>           body
>      EOF
>
> and I like this style, as it seems clearer than the other styles.

I performed a quick survey of the heredoc styles in the tests. Here
are the results[1] of my analysis on the 'seen' branch:

total-heredocs=4128

same-indent=3053 (<<EOF & body & EOF share indent)

    cat >expect <<-\EOF
    body
    EOF

body-eof-indented=24 (body & EOF indented)

    cat >expect <<-\EOF
        body
        EOF

body-indented=735 (body indented; EOF not)

    cat >expect <<-\EOF
        body
    EOF

left-margin=316 (<<EOF indented; body & EOF not)

        cat >expect <<\EOF
    body
    EOF

So, the indentation recommended in my review -- with 3053 instances
out of 4128 heredocs -- is by far the most prevalent in the project.

[1]: Note that there is a miniscule amount of inaccuracy in the
numbers because there are a few cases in which heredocs contain other
heredocs, and some scripts build heredocs piecemeal when constructing
other scripts, and I didn't bother making my analysis script handle
those few cases. The inaccuracy is tiny, thus not meaningful to the
overall picture.

> > I see that you mirrored the implementation of FAKE_LINES handling of
> > "exec" here for "fixup", but the cases are quite different. The
> > argument to "exec" is arbitrary and can have any number of spaces
> > embedded in it, which conflicts with the meaning of spaces in
> > FAKE_LINES, which separate the individual commands in FAKE_LINES.
> > Consequently, "_" was chosen as a placeholder in "exec" to mean
> > "space".
> >
> > However, "fixup" is a very different beast. Its arguments are not
> > arbitrary at all, so there isn't a good reason to mirror the choice of
> > "_" to represent a space, which leads to rather unsightly tokens such
> > as "fixup_-C". It would work just as well to use simpler tokens such
> > as "fixup-C" and "fixup-c", in which case t/lib-rebase.sh might parse
> > them like this (note that I also dropped `g` from the `sed` action):
> >
> >     fixup-*)
> >         action=$(echo "$line" | sed 's/-/ -/');;
>
> I agree that "fixup" arguments are not arbitrary at all, but I think
> it makes things simpler to just use one way to encode spaces instead
> of many different ways.

Is that the intention here, though? Is the idea that some day `fixup`
will accept arbitrary arguments thus needs to encode spaces? If not,
then mirroring the treatment given to `exec` confuses readers into
thinking that it will/should accept arbitrary arguments. I brought
this up in my review specifically because it was confusing to a person
(me) new to this topic and reading the patches for the first time. The
more specific and exact the code can be, the less likely it will
confuse readers in the future.

Anyhow, it's a minor point, not worth expending a lot of time discussing.

> > It feels clunky and fragile for this test to be changing
> > "expected-message" which was created in the "setup" test and used
> > unaltered up to this point. If the content of "expected-message" is
> > really going to change from test to test (as I see it changes again in
> > a later test), then it would be easier to reason about the behavior if
> > each test gives "expected-message" the precise content it should have
> > in that local context. As it is currently implemented, it's too
> > difficult to follow along and remember the value of "expected-message"
> > from test to test. It also makes it difficult to extend tests or add
> > new tests in between existing tests without negatively impacting other
> > tests. If each test sets up "expected-message" to the precise content
> > needed by the test, then both those problems go away.
>
> Yeah, perhaps the global "expected-message" could be renamed for
> example "global-expected-message", and tests which need a specific one
> could prepare and use a custom "expected-message" (maybe named
> "custom-expected-message") without ever changing
> "global-expected-message".

That would be fine, though I wondered while reviewing the patch if a
global "expect-message" file was even needed since it didn't seem like
very many tests used it (but I didn't spend a lot of time counting the
exact number of tests due to the high cognitive load tracing how that
file might mutate as it passed through each test).

Another really good reason for avoiding having later tests depend upon
mutations from earlier tests, if possible, is that it makes it easier
to run tests selectively with --run or GIT_SKIP_TESTS.