Re: [PATCH v3 03/10] trailer: unify trailer formatting machinery

Linus Arver <linusa@xxxxxxxxxx> · Wed, 31 Jan 2024 16:46:03 -0800

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>>> +				if (opts->unfold)
>>> +					unfold_value(&val);
>>
>> So, ... how would this affect an invocation of
>>
>>     $ git interpret-trailers --unfold
>>
>> which would have set opts.unfold to true in cmd_interpret_trailers()
>> and eventually called process_trailers() with it, which eventually
>> called print_all(), which conditionally called print_tok_val() but
>> never gave the val to unfold_value()?
>>
>>  ... goes and digs a bit more ...
>>
>> Ahhhh, original process_trailers() did this by calling
>> parse_trailers() that munged the value.  So its print_all()
>> codepath only had to print what has already been munged.
>>
>> But then in this new code, do we unfold only here?  I thought that
>> in the new code you moved around, the caller, whose name is now
>> interpret_trailers(), still calls parse_trailers() and that calls
>> the unfold_value().  So, are we doing redundant work that may merely
>> be unneeded (if we are lucky) or could be harmful (if things, other
>> than the unfold thing I just noticed, that are done both in
>> parse_trailers() and this function are not idempotent)?
>
> I was too confused by the code flow and resorted to tracing what
> happens when "git interpret-trailers --unfold" is run in a way
> similar to how t7513 does in gdb.  Shame shame.

In my larger series I've made the parser and formater much simpler
(removing nesting, calling helper functions, _only parsing once and only
once_, etc) to make both parsing and formatting much easier to
understand. While I am biased as the author of these refactors, I do
think they make the code simpler.

> In any case, the updated code does try to call unfold_value() in
> format_trailers() on "val" that has already been unfolded in
> parse_trailers().

Correct. But this was already the case before this series. IOW the
existing code assumes that this function is idempotent: we call
unfold_value() in parse_trailers() and again in format_trailer_info().

In my tests if I remove the redundant call to unfold_value() in
the new formatter (so that unfolding only happens during
parse_trailers()), all trailer-related tests still pass:

    prove -j47 t{7513,3507,7502,4205,3511,3428,6300}*.sh

FWIW in the larger series we prohibit the parser from making mutations
to the input (unfolding is one such mutation), and allow multiline
(folded) strings to be stored as is in "val", only unfolding the value
if we need to during formatting.

> This may not produce an incorrect result if
> unfold_value() is truly idempotent (I do not know), but even if it
> is correct for its handling of .unfold bit, this episode lowered my
> confidence level on the new code significantly, as I do not know
> what unintended effects [*] all the other new things done in this
> function have, or if none of these unintended effects are
> error-free.

I think dropping the redundant call to unfold_value() would help address
some of your concerns. Of course, all of the existing test cases pass
(with and without the redundant call). As for addressing _all_ (not just
some) concerns, I am not confident I can deliver that as an additional
patch or two to this series because ...

>> It could be that a bit more preparatory refactoring would clarify.
>> I dunno.

... this is basically what I've done in the larger series, because there
are many areas that could be simplified (not to mention addressing some
bugs some bugs are lurking around). Granted, those are more aggressive
refactorings, and are not "preparatory refactoring" at all.

I could just grow this series with another ~22 patches to include those
additional refactors, but I am hesitant about doing so, simply due to
the sheer number of them.

How would you like me to proceed, other than deleting the redundant
unfold_value() call in the next reroll?

For reference, here's a list of those 22 commits that build on this
series as a preview (roughly grouped on themes):

    trailer formatter: deprecate format_trailers()
    trailer formatter: introduce format_trailer_block()
    trailer parser: parse_trailer_v2() for '--trailer <...>' CLI arguments
    trailer parser: parse_trailer_v2() for configuration
    trailer parser: parse_trailer_v2() for trailer blocks
    trailer parser: introduce parse_trailer_v2()
    interpret-trailers: preserve trailers coming from the input
    trailer_block: remove trailer_strings member
    trailer_block: parse trailers in one place
    trailer_block: prepare to remove trailer_strings member
    trailer: make find_same_and_apply_arg() do one thing
    trailer: unify global configuration
    interpret-trailers: print error if given unrecognized arguments
    trailer: make entire iterator struct private
    trailer: rename terms for consistency
    trailer: use create_new_target_for() everywhere
    trailer: capture behavior of addIfDifferentNeighbor
    trailer: EXISTS_REPLACE preserves original position
    trailer: capture replacement behavior for multiple matches
    trailer: split template application into 2 steps
    trailer: free templates in one place
    trailer: template's final value does not depend on in_tok

and "trailer formatter: introduce format_trailer_block()" is the one
that refactors the unified formatter introduced in this patch to have
multiple smaller helper functions.